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Preface 


For too long ‘applied mathematics’ in schools and universities has meant 
‘mechanics’. Although this area still has an important role to play, many students are 
turning away from it and are showing a growing interest in newer areas of 
applications of mathematics, such as those covered by the various syllabuses in 
A-level ‘Decision mathematics’. These topics appeal to students because the 
applications are to problems arising in commerce, information technology and the 
environment, and generally do not involve a knowledge of physical principles. 

The objective of this book is to communicate the flavour of some of these areas 
of recent applications of mathematics. It is intended to be read by students and their 
teachers on A-level courses at school or college, or in the first year of under- 
graduate degree courses. I have taught in the classroom all the material which is 
covered in this book, and it is interesting that this has steadily ‘moved down’ the 
curriculum. For example, I gave postgraduate lectures on linear programming in the 
early 1960s, yet this subject is now in Level 10 of the National Curriculum, as well 
as forming part of the A-level ‘Decision mathematics’ syllabus. In a similar way, 
l introduced control theory as an option for final-year mathematics undergraduates 
25 years ago, whereas introductory courses are now taught at much lower levels. 
Part of the explanation for this moving down is due to the ready availability of 
computing power, enabling the use of efficient algorithms to solve problems. 
Throughout I have tried to emphasize discrete models using difference equations 
and matrix representations, and to reduce the emphasis on calculus and differential 
equations. This is for two reasons: first, students generally find discrete mathe- 
matics easier to grasp compared with the rather difficult concepts of calculus; and 
secondly, the amount of calculus being studied in schools is likely to decrease 
further in future. I have deliberately not included any discussion of computer 
software packages such as DERIVE or MAPLE. Not every student has ready access 
to these, and in any case software is revised or replaced so frequently that textbook 
treatments can quickly become out of date. Furthermore, I believe that at a 
beginning stage you learn and absorb concepts and techniques more readily by 
actually tackling problems with no more to help you than a good mathematical 
pocket calculator. 


vii 


viii Preface 


The book begins with a description of how measuring time in small steps leads 
to realistic models which do not involve calculus, and a number of applications using 
so-called difference equations are discussed, Methods for solving these equations are 
explained, and Chapter 1 closes with an account of how matrix algebra can be used 
to deal with more complicated problems, especially population models of the natural 
world. 

Chapter 2 takes up the completely different topic of error-correcting codes. 
Applications include supermarket barcodes which identify products, the International 
Standard Book Number (ISBN), compact discs whose high quality of sound 
reproduction depends crucially on coding, and data sent from spacecraft which 
would be unintelligible but for appropriate coding. This is a particularly interesting 
area of contemporary applied mathematics, since so much of society now depends 
upon the accurate handling and transmission of information. 

Another feature of the modern world is the use of computers to control the 
behaviour of engineering and other systems — for example, the automatic gearbox of 
a motor vehicle, the automatic landing of an aircraft, or putting a satellite into the 
correct orbit. An introduction to some of the mathematics involved is given in 
Chapter 3. In fact, because many control models involve dynamics this is one part of 
the book where a knowledge of the basics of Newton’s laws of motion is useful. 
Although these models involve simple differential equations as well as difference 
equations, the main mathematical tool is matrix algebra. Required properties of 
matrices are developed as needed. 

Finally, various aspects of the ever-present problem of optimizing the use of 
limited resources are investigated in Chapter 4, including linear programming, 
transportation problems and networks. The mathematics is at relatively simple levels 
except in the very last section on optimal control, which continues from Chapter 3 
and can be regarded as something of a bridge to more advanced work. 

Each chapter contains many worked examples, together with exercises which 
you should attempt to solve as you go through the book. At the end of each chapter 
there are further problems which are a bit more challenging, and you should at least 
skim over them, as they often contain further illustrative applications. Answers to all 
the numerical parts of the exercises and problems are provided. Teachers will be glad 
to know that a manual of written-out solutions to the exercises and problems is 
available on request from the publishers, free of charge. Students should not, in their 
own best interests, use it to cheat! 

My style throughout is deliberately informal, with virtually nothing involving 
‘pure mathematics style’ proofs. I introduce mathematical techniques as and when 
required for applications, so as to reduce the need to refer elsewhere. Some 
mathematical colleagues may well be dismayed by the absence of formal proofs, and 
may claim that I am encouraging sloppy ways of mathematical learning. However, 
for me mathematics has always come alive through its applications, and if the book 
manages to get this over to readers then I will be satisfied; rigorous developments 
which do not excite interest (except amongst a dedicated few) are what gives 
mathematics a bad name! 


Preface : ix 
I wish to thank Brian Bunday, Tim Cronin, Mike Gover and Colin Storey for 
their helpful comments on first drafts, and Carolyn Barry for her excellent typing of 
the manuscript. 


Stephen Barnett 
Leeds, January 1995 
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1.1 INTRODUCTION AND EXAMPLES 


In real life, time is not measured continuously but in packets of a fixed amount, 
whether these be tenths of seconds, seconds, minutes, hours, days, months, years or 
whatever. For example, if you are ill with a fever then your temperature may be 
taken perhaps every hour — you certainly don’t lie in bed with a thermometer 
permanently sticking out of your mouth so that a nurse can measure your tempera- 
ture ‘continuously’. In a similar way, if you have some money in a building society 
or bank to which interest is being added, this is only done at regular intervals, 
perhaps annually or monthly — rarely is the interest added on a daily basis, so it’s no 
use checking every day to see whether your money’s growing! Similarly, economic 
statistics such as the rate of inflation or the unemployment total are usually released 
at monthly or even quarterly intervals. 

In this section we give some illustrations or situations where time is measured 
discretely, that is in finite, distinct amounts of ‘steps’ — so indeed ‘time marches on’, 
to quote the name of an American cinema newsreel popular in the 1940s and 1950s, 
and occasionally repeated on TV for historical interest. 


2 Time Marches On 
@ EXAMPLE 1.1 


A bank savings account pays interest at an annual percentage rate (APR) of r, 
which is ‘compounded’ n times per year, where n is an integer. For example, 
suppose that the APR is 5 compounded semi-annually, so that n=2. We use 
x(0) to denote the amount which you put in to open the account, since this is 
when we start measuring time. At the end of six months, that is one time 
period, it’s convenient to write x(1) for the amount in the account. This will 
consist of the initial deposit together with the interest it has gained over the 
half-year, at ha/f the annual rate (2.5%), so we have 


25 
1) =x(0) +2 
x(1) = x(0) + 700 x(0) 


=1.025x(0) 


If your money is left in for a further half-year, then ‘compound’ interest means 
that the whole of the new sum gains interest, not just the original deposit. Since 
two time periods have elapsed, the amount in the account is therefore 


x(2) = 1.025 x(1) 


and substituting for x(1) you can see that the amount in the account after a year 
(two time periods) has elapsed is 


(2) = (1.025)? x(0) 


Let's see what happens in general when the year is divided up into n equal 
parts. At the end of each period the interest is added to the account at a rate of 
r/100n (since ris a percentage) on the balance at the beginning of the period. 
Let x(k) be the amount in the account at the end of the kth period, where kis 
the time variable which can take the values 1, 2,3,.... After k time intervals the 
amount in the account is the previous balance together with the interest it has 
earned, so that 


= = r - 
x(k) = x(k N+ sGon xk 1) 


=(1 id = Act 
( * soon) 1) (1.1) 
Writing a=1+ 1/100n gives 

x(k)=ax(k-1), k=1,2,3,... (1.2) 
Equation (1.2) is an example of a difference equation, so called because it gives 
an expression for the difference between x(k) and x(k-—1). The name 
recurrence equation (or relation) is also used, because the values of x(k) can be 
computed recursively (i.e. one after the other) by simply substituting for k in 
(1.2) as follows: 


k=1: x(1)=ax(0) 
k=2:  x(2)=ax(1)=a?x(0) 
k=3: x(3)=ax(2) = a°x(0) 
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and so on. You should be able to spot the pattern: the general solution of (1.2) 


is 

x(k)=a*x(0), k=1,2,3,... (1.3) 
It's worth pointing out that an equivalent version of (1.2), which is often used, 
is 


x(k+1)=ax(k), k=0,1,2,3,... 


In this case we start counting at k=0. There is no change in the general solution 
as it appears in (1.3). 

An alternative, widely used notation is to write x, instead of x(k) to stand 
for the value of the variable after k time periods have elapsed. 

So far we have assumed that the account is opened with an initial deposit 
x(0), which is then left alone to accrue interest. However, it’s usual to make 
frequent deposits and withdrawals from a bank account. Suppose that u(k) is 
the net amount deposited during the kth time period, and this does not earn 
interest until the next period; if there is a net withdrawal then u(k) is negative. 
The equation (1.1) then becomes 


=. a - = neh 
xk) = (14+ oi) xk 1) +ulk), k= 1,2,3, 


and this is an example of a general difference equation having the form 


x(k) =ax(k-1)+ Bulk), k=1,2,3,... (1.4) 
where a and f can take any constant values. Again, an alternative form of (1.4) 
is 

x(k+1)=ax(k)+Bu(k+1), k=0,1,2,... (1.5) 


The difference equations (1.4) and (1.5) are called linear because the variables 
are not raised to any powers. This is a mathematical definition which really doesn’t 
convey much information, and because of the importance of the idea of linearity it’s 
worth spending some time discussing it. Suppose we have a ‘black box’ represented 
in Figure 1.1. This box is a mysterious object — we don’t know what is going on 
inside it; all we can do is put in ‘things’ (called inputs) and observe what comes out 
as a result (the outputs). The crucial principle of linearity is as follows. 


Input Output 


Figure 1.1 
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au, + buy ay, + by, 


Figure 1.2 


Suppose the response to an input u, is an output y,, and for an input u, the 
output is y,. Then the combination of inputs au, + bu, produces the same combina- 
tion of outputs ay, + by,, for any values of the constants a and b (Figure 1.2). 

As an illustration, if, say, a=2 and b=0 then 2u, produces 2y, — that is, 
doubling the input results in a doubling of the output. This connects with the concept 
of a linear relationship between two variables which you will be familiar with as a 
straight line graph. However, the way we have described linearity above is much 
more useful, since we don’t need to know what kind of mathematical processes are 
going on inside the ‘box’, so long as they obey the principle of linearity as described 
above. 


@ EXAMPLE 1.2 


Consider a certain linear ‘system’ which has inputs and outputs which are two- 
dimensional vectors. |f you've not encountered the concept of vectors, a brief 
explanation is as follows. The notation a=[a,, a,] denotes a two-dimensional 
vector, which can be thought of as the line in the xy-plane from the origin to 
the point having coordinates x= a,, y= a,. For any scalar k the product ka is 
defined by 
kl a,, a,]=[ka,, ka.) 
which is a line from the origin to the point having coordinates x= ka,, y= ka. If 
b=[b,, b,] is a second vector, then the sum of a and b is defined by 
a+b=[a,, a]+[b,, by) 
=[a,+ by, a, + by] 
Hence the difference of two vectors is 
a-b=a+(-1)b 
=[a, @]1+[-b,,-b)] 
=[a,- by, a,— by} 
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Suppose it is found for our system that an input vector u, =[2, 1] produces an 
output vector y, =[3,0], whereas u, =[1, 4] produces y, =[0, —-1]. Suppose that 
we then put in the combination of inputs 


2u, - 3u, = 212, 1]-311,4] 
= [4,2]-[3, 12] 
=[1,-10] 
The corresponding output is, by the linearity principle, precisely the same 
combination of outputs, namely 
2y, - 3y, = 213, 0] - 3[0, -1] 
= [6, 0]-[0, -3] 
= [6,3] 


In Example 1.2 we referred to a ‘system’, this could be a mathematical 
description involving difference equations, or differential equations, or matrices — 
precisely what is involved is irrelevant. 


EXERCISE 1.1 An old will has just been found showing that your great-grandfather, who 
died 60 years ago, left you £5 (a fair sum of money back then!) which has been 
earning interest at an APR of seven compounded quarterly (i.e. every 3 months). How 
much will you now get? 


EXERCISE 1.2 A saving account pays 5% compounded semi-annually. The initial deposit 
is £1000, and net deposits during successive half-years are £476, £355, —£217, £727. 
Determine the balance in the account at the end of 2 years. 


EXERCISE 1.3 A linear system has a vector response [1, -19] when the vector input is 
{1, 1], and a response [1, —31] when the input is [2, 1]. Express an input (1,0) asa 
linear combination of the two inputs — that is, find constants a and b such that 


[1, 0]=a[1, 1] +5[2, 1) 


Hence determine the response of the system when the input is [1, 0]. 


EXERCISE 1.4 If the annual rate of inflation in the economy is 5%, this means that at the 
end of the year you need £1.05 in order to buy what £1 would have purchased at the 
beginning of the year. If this rate of inflation continues for 10 years, how much do 
you need in order to have the equivalent of £100 purchasing power at the start of the 
next decade? 


@ EXAMPLE 1.3 Fish aquarium model 
In an aquarium it is important to prevent the build-up of a high concentration of 


salt which is dangerous to fish. Suppose we try to do this in the following way. 
We notice that 1 unit of water evaporates during the week. As a restorative 
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tactic, at the end of every week we remove a further 2 units of water, and then 
add 3 units of fresh water. Let n be the total number of units of water in the 
aquarium, let s be the concentration of salt per unit of fresh water, and let x(k) 
denote the total amount of salt in the aquarium at the end of the kth week, after 
the water level has been brought back to normal. At the beginning (k=0) we 
therefore have x(0) = ns. After 1 week there are n—1 units of water left, and the 
salt content per unit is therefore x(0)/(n- 1). We remove two of these units and 
add three fresh ones, so the net amount of salt at the end of week 1 is 


x(1) = x(0) - 2X1) 5 35 
n-1 


initial sattin salt in 
amount removed — fresh units 
of salt units added 


Similarly, at the end of week k the salt content is 


x(k) = x(k—1) — 2XK= 1) 5 gg 


n-1 
salt at saltin saltin 
beginning removed fresh unite 
of week k unite added 


Simplifying this equation gives 


x(k) = 2=3 x(k- 1) +38, k=1,2,3,... (1.6) 
which you can see has exactly the form (1.4) with a=(n-3)/(n-1), Bulk)=3s 
(for all k). 


EXAMPLE 1.4 Rabbit population model 


This description of a rabbit population originates with an Italian mathematician 
called Fibonacci in the early part of the thirteenth century, and because of its 
long history has been widely studied. We make the following assumptions: 


(i) begin with a pair of newborn rabbits (one male, one female); 

(ii) a newborn pair matures (i.e. becomes adult) after 1 month, and 
produces the first offspring at 2 months of age; 

(iii) a pair (one male, one female) is born to each pair of adult rabbits at the 
end of every month; 

(iv) once paired, rabbits remain faithful to each other and do not die! 


These assumptions are, of course, not actually attainable in practice — for 
example, (iv) assumes an unlimited food supply together with immortality, so 
we can think of a very large grassy island free of predators as being a rabbit's 
idea of heaven! Let x(k) be the number of pairs of rabbits present at the end of 
the kth month, beginning with x,=1 (we shall from now on use the neater 
notation x, instead of x(k)). After 1 month, by assumption (ii) this pair has 
matured but has no offspring, so x,=1. After a second month has passed, a 
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pair of offspring is produced, so there are in total two pairs (i.e. x, =2). At the 
end of the third month the original pair has produced another pair of offspring, 
and their first-born has matured, so in total x,=3. This is represented in Figure 
1.3, where the next few months are also shown. 


© =newbom rabbit 0 =mature rabbit 


number of pairs 


End of month 0 CO 1=X9 

End of month 1 |i Ea] 1=x, 

End of month 2 Ooo 2=x, 

End of month 3 EE——C 3=% 
(oe) 


End of month4 QO——{T}—1T+—0O  5=x 


End of month 5 8=x; 


co 


(oe) 
Figure 1,3 


In general at the end of the kth month 


Me = Mey + Xeur (1.7) 
total number of number of number of 
pairs at end adult pairs newborn pairs 
of month k atend of atend of 
month k month k 


The way (1.7) is built up is because of assumptions (ii) and (iii): the total 
number of pairs at the end of month k-11 (i.e. x,_,) becomes the number of 
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adult pairs 1 month later, and the x, pairs at the end of month k-2 all 
produce offspring 2 months later. For example, in Figure 1.3 you can see that 
Xs = X,+ X: all the shapes in the row for the end of month 4 have become 
squares (adults) in the row below, and the three pairs shown at the end of 
month 3 all produce offspring (circles) at the end of month 5. 

Since x =1, x,=1 then by successively substituting k=2,3, 4,5,... into 
(1.7) we obtain 


Mya XtX=2, X%ye=Xyt+X,=3, X= %yt+ xX, =5, 
Xye%ytX%=8, %y= y+ X,=13,... 

The numbers in this sequence 
1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89,... 


in which each one is the sum of the preceding two, are called Fibonacci 
numbers. They arise in an interesting variety of applications, especially in 
nature, but we have room for only the briefest mention of this fascinating 
subject. For example, garden varieties of daisies have 13, 21 or 34 petals, and 
other types have 55 or 89 petals. Fibonacci numbers also arise when counting 
spirals on pine cones, on pineapples, and spirals of seeds on a mature 
sunflower head; and arrangements of leaves on stems of trees also exhibit 
Fibonacci number patterns. In music, an octave on the keyboard of a piano 
consists of 13 keys, 5 of which are black and 8 white. 


The so-called golden rectangle has visually pleasing dimensions which have 


been used widely in art and architecture for many centuries. The ratio of the lengths 
of its sides is }(V5—1):1. This ratio is known as ‘golden’ or ‘divine’, since in 
ancient times people believed it expressed God-given beauty. Nowadays the closest 
most people get to gold is the ever-useful credit card, and if you measure one you’ll 
find, appropriately enough, that its dimensions are virtually those of a golden 
rectangle! It is intriguing that if you compute the ratio of successive Fibonacci 
numbers, namely 


Se sary. 


eat an Tee pees 
X_ Xz Xq Xs Xe41 


then as k gets larger and larger this ratio gets closer and closer to the number 


1 (V5 — 1) =0.618 033 98 ... 


We'll see in Section 4.1, Chapter 4, that Fibonacci numbers are also useful in 
optimization (finding maximum or minimum values of functions). 


EXERCISE 1.5 Using a calculator, work out the values of the ratio of Fibonacci numbers 


Kl fOr kal253),.7123 
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EXERCISE 1.6 Verify by direct substitution that an expression for the kth Fibonacci 
number in the form 


es isa ee . 
net (54) SF} Fa012.3. (1.8) 


satisfies equation (1.7). Verify that it gives the correct values for k=0, 1, 2, 3. 


It is interesting that despite the presence of V5 in (1.8), the formula always gives an 
integer value for x, (see Problem 1.7). We shall see later in Section 1.2 (Example 
1.8) how to derive (1.8). 


EXERCISE 1.7 Notice that for large values of k, since (1 - V5)/2 = -0.62, the second term 
inside the square brackets in (1.8) becomes extremely small, and therefore 


net (E4)” 


Hence deduce that for sufficiently large values of k 


% . 2 _V5-1 
Xest 145 2 


EXERCISE 1.8 Denote the Fibonacci numbers by fo =1, fi =1, fp =2, fs =3 and so on. 
(a) Verify that 
fothafA-l, fotfiith=fn-l 
and prove by induction that in general 
fot hit ht t+tiafian 1) 


(See the appendix to this chapter for a description of the method of proof by 
induction, if you are not already familiar with it.) 
(b) Similarly, verify that 


fothahy fothth=hs 


and prove by induction that in general 


fothit fat + fan= fans 


EXERCISE 1.9 Consider the following model of a bee population in a hive. Unfertilized 
eggs laid by a queen bee hatch into males, and fertilized eggs hatch into females. In 
other words, male bees do not have fathers. The queen bee is able to regulate the 
gender of her offspring to meet the needs of the hive according to information 
supplied by her attendants. Thus the male’s only function in the hive is his role in the 
production of females! 
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(m) @) =male bee 
@) = female bee 


Figure 1.4 


The ancestry of a single male can be traced using an appropriate diagram, which 
begins as shown in Figure 1.4. This is to be read upwards (in the direction of the 
arrow) and shows that the male in question had a mother only, together with a 
grandmother and grandfather. The previous stage is shown in Figure 1.5. Notice that 
the female parentage always branches into two. 


@) 


@) ©) ©) 


Figure 1.5 


Extend the diagram in Figure 1,5 by going back two more generations. 
Now count the numbers of bees at each level in the ancestry diagram. For 
example, in Figure 1.5, this time starting at the top and going downwards, we have 


the following: 
Male Female Total 
1 0 1 
0 1 1 
1 1 2 
i 2 3 


Extend the table to cover your diagram, and confirm that the numbers in the ‘total’ 
column are the Fibonacci numbers. Prove that this is true in general. (Hint: if x, is the 
number in row k in the ‘total’ column, show that the numbers in the ‘male’ and 
‘female’ columns in the same row are respectively x,_, and x,_,.) 


The Fibonacci equation (1.7) is different from the equations in our earlier 
examples, in that it contains the variables at three moments in time, namely x,,., 
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%,,, and x,. Since this involves differences at two units of time (i.e. x,,. — x,) it is 
called a second-order equation, in contrast with the simpler type in (1.4) which is 
called first-order. A useful way of expressing second-order equations is to employ 
the notation and ideas of matrices. Let’s consider a more general form than (1.7), 
which we can write as 


Nyy = AX, + dK, k=0,1,2,... (1.9) 


where a and b are constants. In order to start things off with second-order equations 
we need to be given two known values, usually those of x) and x,; for example, in 
the Fibonacci equation we had x)=1, x,=1. Define a ‘new’ variable by y= ,, 
Y, =X, V2 =X3, .-. SO that in general 


M=%ur, K=0,1,2,3,... (1.10) 


and also y;,; = X,,2- Substituting for this new variable into (1.9) gives 


Vert = OX, + ayy (1.11) 
We can combine together (1.10) and (1.11) into a matrix form 
x, x; 
kt | K ] k (1.12) 
Vk+1 b ally 


Provided that you are acquainted with the basics of matrix algebra (if not, don’t 
worry, as we’ll give an explanation in Section 1.4) then you'll realize that on 
expanding out the product in (1.12) we simply get back to (1.10) and (1.11). 
Equation (1.12) is itself a special case of an equation which is yet more general: 


x x 
kel) lal * |, k=0,1,2,... (1.13) 
Ve+1 Vk 


where A can be an arbitrary 2x2 matrix. It is interesting to consider a situation 
where (1.13) arises directly, rather than as in (1.12) by converting the single second- 
order equation (1.9) into matrix form. 


™@ EXAMPLE 1.5 Bird population model 


Let's consider only the females in a population consisting of a single bird 
species. This is assumed to obey the following rules, which have been 
constructed on the basis of observations made over a number of years: 


(i) a proportion a of juvenile females born in one year survives to become 
adults in the following spring; 

(ii) each surviving adult female lays eggs in spring to produce an average 
of y juveniles by the next spring; 

(iii) adults die during the year for various reasons, a proportion f surviving 
from one spring to the next. 
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Let x,, y, denote the numbers of juvenile and adult females respectively in 
year k. Then assumption (ii) states that the number of juveniles in year k+ 1 is 


Xe = Ver -K=0,1,2,3,.-. 


The other two assumptions tell us that the number of adult females in year k+ 1 
is 


Year = Xe + BY k=0,1,2,... 


number of number of 
juveniles from adults surviving 
year kwho from year k 


achieved adutthood 


Combining these two equations together gives us the matrix form (1.13): 
Xka1| be 4 Xk 
Views a Bl Y%. 
A 
The assumptions (i)-(iii) are of course idealized. In real life the birth and 


death rates vary with the size of population due to limited food supplies and 
overcrowding, so that a, 8 and y are not constants. 


Economics is a fertile field for discrete time models since the relevant data (e.g. 
profits, investments, income, etc.) are obtained at well-defined instants of time — 
usually weekly, quarterly or yearly. 


EXERCISE 1.10 A very simple model of a national economy assumes that in year k the 
national income /, is equal to C, + P,+G,, where C, is consumer expenditure (e.g. 
on consumer goods), P, is private investment (e.g. on manufacturing equipment) and 
G, is government expenditure (e.g. on education and health). The following 
assumptions are based on investigation of past data: 


(i) consumer spending is proportional to national income in the previous year, 
that is 


C,= aly, 


where a is a constant; 
(ii) private investment is proportional to the change in consumer spending over 
the previous year, i.e. 


P,=B(C,-Cy-1) 
where £ is a constant. 
Show that /, satisfies the second-order difference equation 
Tiga A+ BY + OBL, = Gyea, k=0,1,2,... 
EXERCISE 1.11 A model for population movements into and out of California is based on 


evidence that 10% of the United States population outside California moves into that 
state every year, whereas 20% of the population of California moves out every year. 
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Let x,, y, be the numbers of people living respectively outside and inside 
California in year k. Derive a model in the form (1.13), and state the matrix A. 
By using the substitution 


X= 2+ Up, Ye=Ue— Ve 
show that 
QU ger + User = 2U, + 0.70, 
Ups Ue = My -O.7Y, 
Hence show that 
=U, Ve= (0.7), fork=1,2,3,... 


Finally, deduce that after sufficient years have passed so that (0.7)‘=0, then under 
the stated assumptions one-third of the population of the United States would be 
living in California. 


1.2 SOLUTION OF DIFFERENCE EQUATIONS 


We now study in more detail how to solve linear difference equations. We saw in the 
previous section that the general solution of the first-order equation 


X41 = AX, k=0,1,2,3,... (1.14) 


is x, = a*xp. It is often important in applications to know what happens to x, as k 
becomes larger and larger. We use the notation k— co to mean that k iaeeses 
indefinitely. Clearly if a is a real number whose magnitude is less than one then a‘ 
gets smaller and smaller as k increases. We write 


a*>0 as ko 


(read as: ‘a* tends to zero as k tends to infinity’) to mean that the magnitude of at‘ 
can be made smaller than any positive quantity you care to name, simply by making 
k large enough. To take a simple example, if a@=0.1 then we can make a* smaller 
than 10~”°, say, by taking k > 107°. Conversely, if a has magnitude greater than one, 
then a‘ gets bigger and bigger as k increases, and we now write 


ak oo as ko 


(‘a* tends to infinity as k tends to infinity’) to mean that the magnitude of a* can be 
made larger than any specified positive quantity simply by taking k large enough. 

We use the notation |a|, the modulus of a, to denote the magnitude, or 
numerical value, of a irrespective of its sign, so |a| <1 means that -1<a<1.If a 
is a complex number with real and imaginary parts u and v, so that a = u+iv where 
i?=—1, then a can be represented in the so-called Argand diagram shown in Figure 
1.6. The modulus |a| of a is the distance r from the origin to the point with 
coordinates u and v, so 


jal=r=(w+ v’)!? (1.15) 
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Imaginary part 


| 
| 


The argument 0 is the angle shown in Figure 1.6, and u=r cos 0, v=r sin 0 so that 


————— Real part 


Figure 1.6 


a=r(cos 0+i sin 0) 
A famous result about complex numbers states that for any angle @ (in radians) 
e’=cos O+i sind 
so we can write a = re'’, Moreover, for any positive integer k it follows that 
(ei®)k = elk? 
=cos kO+i sin ké 
Let’s look at what happens to the form of the solution of (1.14) when a is complex: 
x, = ax, 
=(Fel)'x 
=r*(cos kO+i sin kO)xo 
Now, for any angle @ we always have 
-1<cos k@<1, —1<sin kO<1 


It therefore follows that as k — oo then x, > 0 whenever r< 1, and x, 9 if r>1. 

We can combine together the results for when a is real or complex, and say that 
as k—>o then x, 0 when |a|<1, and x, if |a|>1, with the understanding 
that when a is a complex number its modulus is defined by (1.15). 


EXERCISE 1.12 Investigate the solution of (1.14) when a=+1, or a=+i. 


We now move on to more general first-order equations, first seen in (1.4) and 
(1.5), where there is an extra term on the right-hand side of (1.14). Let’s look first at 


Xu =ax,t+c, k=0,1,2,... (1.16) 
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where c is a constant. Substitute consecutive values of k into (1.16), starting with 
k=0: 


X= AX +c 
X= ax, +c=a(axtc)+c=a'x+actc 
X= ax, +c=axX+a°ctactc 
X= ax,+c=a'x,+(a+a’+a+1)c 
You should be able to spot the pattern: for a general value of k this is 


24 +artatlc (1.17) 


yaa (a a 
The term within brackets in this expression is 
Sp=lt+ata?t+s-+at! (1.18) 


and is called a geometric series, each term in it being a@ times the previous one, 
starting with 1 (there are k terms altogether). Multiplying (1.18) by a gives 


S,a=ata?+--+a* (1.19) 
and subtracting (1.19) from (1.18) produces 
$,-S,a=1-a* (1.20) 


which can be simplified to 


This is a well-known formula for the sum of a geometric series, but you have 
probably noticed that it doesn’t work if a=1, since we can’t then divide by 1— a; 
indeed (1.20) simply says that 0=0! However, when a= 1 each term in the series in 
(1.18) is itself 1, so that S,=k. Putting these facts together, the general solution 
(1.17) of (1.16) is 


a- ac 


x, = ake, + aa? a#l (1.21) 


We have seen that if |a| <1 then a‘ 0 as k=. Hence in (1.21) when |a|<1 
the terms involving a* become insignificant for large enough k, so that x, 
approaches c/(1— a). 
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@ EXAMPLE 1.6 


Let's return to the aquarium model described in Example 1.3. Comparing equations 
(1.6) and (1.16) we see that 


anes 


7 -O=eS 
n-1 


a 


so that 0<a<1. Therefore as k becomes large, the total amount of salt x, in the 
aquarium approaches 


c__ 3s(n-1) 


1-a 2 


Since the aquarium contains n units of water, the concentration of salt is 
3s(n—-1)/2n. If the tank is large then (n-1)/2n=3, so the concentration of salt 
becomes approximately 3s/2 after a long period of time has elapsed — that is, 
50% higher than the original concentration. 


EXERCISE 1.13 Use (1.21) to determine the solution of the following equations, subject 
to the given condition: 


(a) X44, 2x, =4, % =3 
(0) Xp) — Xe = 2, X=T 


In each case, verify by direct substitution into the equation that your solution is 
correct. 


EXERCISE 1.14 Suppose that a different scheme is operated for the aquarium model in 
Example 1.3, in which at the end of each week we remove a single unit of water, and 
then add 2 units of fresh water so as to bring the level back to normal (recall that 1 
unit of water evaporates during the week). Obtain the difference equation in this case, 
corresponding to (1.6). Show that in this case, again assuming that n is large, the 
concentration of salt will effectively double over a long time period. 


Let’s now increase the level of difficulty a further notch by taking the extra term 
in the equation to be k, instead of a constant: 


Xyay = aX, +k, k=0,1,2,... (1.22) 
To get the solution, again substitute consecutive values of k: 

xX; = AX 

X= ax,+1=a(ax)+1l=a'x +1 

X= ax, +2=a'x+at+2 

X= ax,+3=a'x, + a?+2a+3 
Here you should be able to see that the pattern for a general value of k is 


x= ahxy + a*?4+2a'3 43a! 44.04 (k-Datk-1 (1.23) 
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The expression (1.23) is the general solution of (1.22), as can be verified by direct 
substitution. When a = 1 then (1.23) reduces to 


X=X%+14+24+3+---+(k-1) 
=X) t+hk(k-1) 
where we have used a standard formula for the sum of the integers from 1 to k—1 
(see (A1) in the appendix to this chapter). 
EXERCISE 1.15 A standard identity (see Problem 1.10) states that 


k= 
[1 -kd +(k= 104) oan 
(1-6) 


Use this to show that (1.23) can be rewritten as 


0+ 207 +307 +--+ (k-1)0'"'= 


k 
apm neh a al (1.24) 
(1-ay 


EXERCISE 1.16 Determine the solution of each of the following equations valid for 
k=0, 1, 2,3, ..., and subject to the stated condition: 
(a) X41 = 3x +k, HO =4 
(b) x4, +2%,=2—k, x =2 
(C) Xp = K+2, HY =—-1 
In each case, verify that your answer is correct by checking that it does indeed satisfy 
the given equation. Notice that in (b) and (c) you need to use the principle of 
linearity: regard the right-hand side as a sum of two ‘inputs’, and the corresponding 
solution as the ‘output’. 


EXERCISE 1.17 Consider a roll of kitchen foil which is wound around a cardboard core 
cylinder of radius 3 cm. The foil is 0.005 cm thick, so when the foil is wrapped k 
times around the core the outer radius of the roll is 3+ 0.005k cm. Let x, be the total 
length of foil when it is wrapped k times around the core, with xp =0. 

(a) Show that 
Xa =X +60+0.01 2k, k=0,1,2,... 
and solve this equation to obtain an expression for x,. 


(b) When the outer radius of the roll is 3.2 cm, what is the total length of foil? 
(c) What is the outer radius of the roll when it holds a total length of 209 cm? 


We can now tum to solving second-order equations in the form (1.9), which 
we'll rearrange as 
Xts2 — px, — bx, =0, k=0,1,2,... (1.25) 


In view of the solution of the first-order case (1.14), which we found to be in the 
form x,= ‘xg, it seems reasonable to see if this also works for (1.25). To avoid 
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confusion, we’ll use a parameter A instead of a, and try x, =A", where clearly A + 0, 
since x, = 0 is a trivial and uninteresting solution. Substituting this into (1.25) gives 
Pee aA**} ae bAt=0 
which can be factorized as 
AKA? - ad-b)=0 
We have rejected A =0 as a possibility, so we must have 
2 -ahk-b=0 (1.26) 


If this quadratic equation has two roots A,, A, different from each other, then 
we’ve shown that each of 


M=C At, =A} 
is a solution of (1.25), for any constants c, and c,. It therefore follows by the 
principle of linearity that 

x =¢,At+c,at (1.27) 
is also a solution, and this is in fact the most general form. It contains fwo constants 


c, and c, whose values are determined by using two given conditions, usually 
specified values of x) and x,. 


@ EXAMPLE 1.7 


We solve the equation 
Xpsa t 5X1 + 6X, =0 (1.28) 
subject to x, =2, x, =3. The quadratic equation (1.26) is 
A?+5A+6=0 
which factorizes to 
(A+2)(A+3)=0 
so the roots are A, = —2, A, = -3. The solution of (1.28) is therefore 
X= C,(—2)*+ c,(-3)* (1.29) 


To find the values of c, and c, we substitute k=0, k=1 into (1.29) and use 
(-2)°=1, (-3)°=1 to get 


2=¢,+C, 
3=-2c,-3c, 


The solution of these equations is c,=9, c,=-7, so the required solution of 
(1.28) is therefore 


x, =9(-2)*-7(-3)*, k=0,1,2,... 
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@ EXAMPLE 1.8 


We can now derive the solution given in (1.8) of the Fibonacci equation (1.7), 
which we write here as 


Xio2— Xe %=0, k=0,1,2,... 
The equation (1.25) is 
A?-A-1=0 
Its roots, using the standard formula for solving a quadratic equation, are 


1evi1+4) _ 14+v5 
Ay Ag = == 
102 2 2 
so that 

Ay=3(14+V65), d= 3(1-V5) 
and the solution is therefore 

k = k 

x20 45) +0135) (1.30) 


It now remains to find the values of c, and c, using x)=1, x, =1. This gives the 
simultaneous equations 


1=¢c,+c¢ 
=o’ +8) o(*=8) 


and you should check that their solution is 
¢, = (14+V5)/2V5, c= (V5— 1)/2V5 


Substituting these values into (1.30) then gives the solution we quoted earlier in 
(1.8). 


EXERCISE 1.18 Determine the solution of each of the following second-order equations, 
subject to the stated conditions: 


(a) Xp42 + 7X41 + 12%, =0, x9 =2, x, =3 
(BD) X p42 4% 4) + 5%, =0, X= 2, x, = 444i 


with k=0, 1, 2, 3, ... in each case. 


You might be wondering at this stage how we were able to exclude A=0 
definitely as a possible value, when you know that a quadratic equation like (1.26) 
can sometimes have a zero root. However, if 4=0 is substituted as a root into 
(1.26), then this requires that b=0, in which case the difference equation (1.25) 
becomes 


Xpa2 — X41 =O 
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This is no longer a second-order equation, merely a first-order difference equation 
(admittedly in a slightly disguised form) and its solution is just x, = a‘x), as before. 

There is one situation, though, that we must examine in extra detail: this is when 
the quadratic equation (1.26) has two equal roots, say A=A,, in which case the 
solution (1.27) becomes 


X= (C, + CQ)A5 
=c3A, say 
This can’t be the complete solution to the difference equation (1.25), since it contains 


only a single arbitrary constant. To find the other part of the solution, we try 
substituting x, = kA into the left-hand side of (1.25), producing 


Xp sa — Wig yy — DX = (k +2)(Ag)*? — a(k + 1)(Ag)*" = BK (AS) 
= kAK(A3 - ad, — b) + A¥*!(2A, - a) (1.31) 
The first term within brackets in (1.31) is zero since A, (by definition) satisfies the 
quadratic equation (1.26), that is to say 
0=A?-al-—b = (A-A;)* 
=-21,A+23 
In this last identity, we see from the terms in A on both sides that a=2A,, so the 


second expression within brackets in (1.31) is also zero. Hence our trial solution 
works, so the general solution of the difference equation in this case is 


X= CsA‘ + ck AS (1.32) 


where again c, and c, are constants determined by two given values of x,. 


@ EXAMPLE 1.9 


The equation 
Xpy2 + bX, + 9%, =0 

subject to x) =—1, x, =1 gives rise to the quadratic 
A? +6A+9 = (A+3)?=0 

Hence A, = -3, so from (1.32) the solution is 
X= €3(-3)*+ o, k(-3)* 

Substituting k=0, k=1 gives 
-1=¢, 

1=-3c,-3¢, 

so that c;=-1, c,= 3. The solution is therefore 

x= (-3)*(-1+2k), k=0,1,2,... 
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EXERCISE 1.19 Determine the solution of 
Xg42 — 10%,,, + 25x, =0 


subject to x, =30, x, = 60. 


EXERCISE 1.20 Obtain the general solution of the equation 
Xpo2— 20%, +%,=0, k=0, 1,2, 3,... 


when |a| <1 by setting a=cos a (use the result e'*=cos a+i sin a). 
Determine also the general solution when a= 1, and when a= —1. 


The next level of difficulty is when the right-hand side of the second-order 
difference equation (1.25) is non-zero, Let’s consider just the case when 


Xp42— OX%j4,—- bx, =c, k=0,1,2,... (1.33) 


where c is a non-zero constant. In the first-order case we found — see (1.21) — that 
the extra term in the solution was a multiple of c, so by analogy it’s natural to try the 
same thing. Substituting x, = pc, where p is a constant, into (1.33) gives us 


pe - ape — bpe=c 
or 
pe(l-a-b)=c 


so that p=1/(1—a—b) provided 1—a-—b #0. The complete solution of (1.33) is 
then the general solution (whatever it comes out to be) of the equation with c=0, 
that is the equation 


Xp42 — yy, — bx, =0 (1.34) 


plus the extra part c/(1—a—b). In the standard jargon, (1.34) is called the 
homogeneous part of (1.33), because of the zero right-hand side. 


@ EXAMPLE 1.10 


Consider the equation 
Xe2 + 5X41 + 6X = 10 (1.35) 


The homogeneous part was solved in Example 1.7. Here c=10, a=—5, b=-6 
so the extra part of the solution due to the term on the right-hand side of (1.35) 
is 


10/(1+5+6)=8 


Adding this to the earlier expression in (1.29) gives us the complete solution of 
(1.35) as 


X= C(—2)*+ cp (-3)* +8 
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We need to find out why the method breaks down if 1 — a— b=0. This condition 
means that the quadratic equation (1.26) has a root A, = 1, so that the solution of the 
homogeneous part is 


%=C,(1)*+ cA$ 
=C,+C,A} 


which already contains the ‘extra part’ x,=constant. The result given in the 
following exercise holds in this case. 


EXERCISE 1.21 Verify by direct substitution that 
x,=ck/(2-a), a#2 
satisfies the equation 
Xp42 — 4, -b%=c, 1-a-b=0 (1.36) 
Similarly, verify that when a= 2 the complete solution of (1.36) is 


Xp = Cyt Cak +3? 


@ EXAMPLE 1.11 


Let's solve the equation 

Xpa2 + 4%, -— 5%, =9 (1.37) 
Here a= —4, b=5, so 1- a- b=0, and the associated quadratic equation is 

A? + 44-5 = (A-1)(A+5)=0 


which has one root 4, =1, the other being 4,=-5. Since c=9, from Exercise 
1.21 the extra term in the solution is ck/(2 - a)=3k/2. Hence the general solution 
of (1.37) is 


X= C, + &(-5)* + 3k/2 


EXERCISE 1.22 Determine the solution of each of the following equations subject to the 
stated conditions: 


(a) Xpag t 11%, + 18x, = 30, % = —-1, x, =1 
(b) Xe42-— X= 1, HO =4, 1 =6 
(C) 4xXq42— 12% 41 + 9% = 1, % =0, x =5 


In the next section we’ll introduce a slicker way of tackling difference equations. 
Before doing so, we close this section with another application which is interesting 
because the variable k does not in this case represent time, However, as the example 
involves elementary ideas of forces and static equilibrium, those unfamiliar with 
these concepts can skip directly to the next section. 
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@ EXAMPLE 1.12 


A wire (whose mass can be neglected) is tightly stretched between two points 
as shown in Figure 1.7. 


Y, 


W 
Figure 1.7 


Particles having masses m,,™m,,...,™M, are attached to the wire at equal 
horizontal distances d apart. In reality the wire will be almost (but cannot be 
exactly) horizontal, and the situation affecting three neighbouring masses is 
shown in Figure 1.8. 


Figure 1.8 


The tension in the wire is T, and the angles made with the horizontal have been 
exaggerated in the figure. Denote by y, the height of the wire above the ground 
level for particle m,, which is at a horizontal distance kd from the left-hand end. 


—_— d > 


(Ye ~ Yad) 
A-1 
Figure 1.9 
We see from Figure 1.9 that 
tan &-1= fae (1.38) 


since the portions of wire between each neighbouring pair of weights are 
assumed straight. Also, since the angles (measured in radians) are small we 
can assume, to a close approximation, that 


tan 0,.,=94-,=SiNn Oy, 
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Equating the vertical components of the forces acting on the kth mass gives 
T sin 0,_,-T sin 0,=mg 
so that 
T(O,-1— 9.) = mg (1.39) 
Substituting 04; = (4-1 — ¥4)/d from (1.38) into (1.39) produces 
Ve-1— Ve — Ve~ Views _— MG 


d d T 
which simplifies to 


mgd 
if 


Replacing k by k+1 puts this equation into our more familiar form of second- 
order difference equation: 


Yew ~ 2V e+ Vent = 


M190 
T 


and Yo, Yn.; are the fixed heights of the wire at the left- and right-hand ends 
respectively. 


Ve+2— 2V e401 + Ve= (1.40) 


EXERCISE 1.23 Suppose that in Example 1.12 the masses are all equal to m, and that the 
horizontal axis is chosen so that y)=0, y,,,=0. Verify by direct substitution into 
(1.40) that the general deflection of the wire is 


= EGO E VD 50,12) n+l 


a: oT : 


1.3 THE z-TRANSFORM 


As we progressed through the solution of linear difference equations in the previous 
section, no doubt you found the details rather tedious. What is needed is a 
foolproof procedure which is guaranteed to work without having to worry about 
special cases or other difficulties. Such a scheme is provided by the concept of the 
z-transform, which we’ll introduce in this section. The idea of a transform is 
widely used in mathematics, and broadly speaking consists of changing or 
‘transforming’ a problem into a different one which can be solved more easily. For 
example, the product of two numbers can be found by adding together their 
logarithms — the problem of multiplication is transformed into the simpler one of 
addition. Indeed, before cheap pocket calculators became available around 20 years 
ago, logarithms were widely used for arithmetical work. Admittedly, the 
simplification to be obtained using the z-transform will not be obvious to you at 
first, but stay with it! 
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The solution to any difference equation we are interested in will consist of a 
sequence of values x9, X,, X,, ... (assuming that time commences at k= 0) and we’ll 
use the notation x, or {x,}, with the understanding that k=0, 1,2, .... We define the 
z-transform of this sequence by the series 


& 3 ca (1.41) 


The variable z is simply a parameter which is never assigned a particular value, and 
the transform X therefore depends on z, often written in the form X(z). Another 
useful notation is to write Z(x,) for the z-transform X of the sequence x,. The 
sequence and its transform are called a transform pair. 


@ EXAMPLE 1.13 


Suppose that x)=1, x,=3, x,=~-3, x,=5,x,=0 for k>5. Then the z-transform 
of this sequence is simply 


Me ee 


ie grey 


In our applications to the solution of linear difference equations the sequence x, 
will be infinite. How can the apparently complicated infinite series (1.41) then help 
us to solve the equations? 

There are two steps in answering this. First, for many sequences the series (1.41) 
can be expressed in a simple form, as we’ll now illustrate. 


@ EXAMPLE 1.14 


(a) The simplest form x, can take is to be a constant c for all k, so (1.41) gives 
Meroe eo hance 
i 


aqi+i+ate| 
z Zz 


a) 


ce 
=oH4 (1.42) 
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where to get (1.42) we have used the binomial expansion formula (see 
Problem 1.7): 


n(n-1) 2, n(n—1)(n-2) 23 
BLAIR EASY te las A 
i a 1 a (1.43) 


with a=-1/z, n=—1 (note that 31=3x2x1, 41=4x3x2x 1, etc.). 
(b) Next, suppose x, is a geometric sequence, that is 


(14+ a)°=14+na+ 


Xp = Vp Hy Cp My Oy ores MERON, ae 
where cis a constant. From (1.41) we get 


2 


Niet spicier 
z 2 
i) 
Zz 


a 


z= 6 


where we have again appealed to (1.43), this time with a= -c/zand n=-1. 


A word is necessary here about the validity of (1.43): strictly speaking, when n 
is not a positive integer then we must have |a| <1 for the identity to hold. However, 
we are getting into the mathematical minefield of what is called ‘convergence of 
series’; the aim of this book is not to dwell upon the rigours of mathematics but to 
convey some ideas and applications. Suffice it to say, then, that for our purposes we 
can always assume that the parameter z can be made to satisfy such requirements as 
are necessary for the results to be valid. 


EXERCISE 1.24 Determine the z-transforms of the following sequences defined for 
k=0,1,2,...: 


(a) x=k, 
(b) x,=e", c=constant 


Of course, the z-transform of a sequence only needs to be worked out once, and 
tables of such transforms are readily available, the following (Table 1.1) being a 
brief sample. 

Two points are worth mentioning about Table 1.1. First, it can be used in either 
direction: we can find the sequence corresponding to a given z-transform by reading 
from right to left. Secondly, because of the definition in (1.41) we can see that z- 
transformation is a linear operation: that is, according to our earlier discussion, if 
Z(x,)=X and Z(y,) = Y then by the principle of linearity 


Z(ax, + by,) = aX + bY 


for any constants a and b. 
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Table 1.1 Some transform pairs 


Sequence x, Transform z(x,) 
(4 
5 a1 
a 
: @-1? 
e 2(z+1) 
@-1" 
Fe) 2(2" +42 +1) 
@-1)* 
ct = 
re 
k cz 
ke Ge oe 
chet : 
Seen US 
C2) —C (z-c)(2- ¢) 
oftl agit! o aa 
Ag Neue 
o=-c (2-c)(z- C2) 


M@ EXAMPLE 1.15 
By using the appropriate entries in Table 1.1, and the linearity principle, we see 
that if 
x, = 3*+5(4)* (1.44) 
then its z-transform is 


Bicep BZ — 2(z- 4) +5z(z-3) 
z-3 72-4 (z-3)(z- 4) 


— _2(6z-19) (1.45) 
z?-7z+12 
Alternatively, suppose we are given the z-transform in (1.45) and asked to 
find the sequence to which it corresponds. We simply factorize the denominator 
and proceed as follows: 


z(6z-19) __6z*~19z 
z?-7z+12 (z-3)(z-4) 


eee Bae eee OR 
(z-3)(z-4) (z-3)(z-4) 
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We can obtain the sequence to which each of these transforms corresponds 
by using the last two entries in Table 1.1. By the principle of linearity, the 
overall sequence is therefore the sum of these two, that is 


_ 6(4***-3**")  19(4*-3*) 
x= SF ot 


4-3 4-3 
= (-6.3 + 19)3* + (6.4- 19)4* 
=3*+5(4)* 


which agrees with (1.44). 


EXERCISE 1.25 Determine the z-transforms of the following sequences defined for 
k=0, 1,2, 3,... by using Table 1.1: 
(a) x, =11(-3)'-9(-4)* 
(b) x, =3(2+i)*— (2-1)! 
(©) xp=(-3)(-1434) 


EXERCISE 1.26 Use Table 1.1 to determine the sequence x, for each of the following 
z-transforms: 


i » 32° +2 bi 12° -92 
2P+11z+18" 22-3242" (2-1) 


(a) 


Our first step towards solving difference equations with the z-transform has been 
to show how, for a given sequence or transform, we can determine one from the 
other. You can think of a sequence x, and its z-transform X as two sides of the same 
coin: they each represent in different ways the description of some particular process 
in which we are interested. 

We now turn to the second idea which is needed before we can actually proceed 
to solve difference equations. The key to understanding this is provided by the 
following argument. 

Suppose we shift our sequence to the left, so that the values at k=0, 1, 2,3,... 
are NOW X;, X;, X3, X4, ..., a8 Shown in Figure 1.10. Each value has moved one place 
to the left. We can denote this new sequence by x,,,, where we retain our convention 
that we start counting at k=0. By the definition (1.41), the transform of x,,, is 


i] 


= 2X — 2X (1.46) 
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o—_—_—_ * ———> k 
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New sequence X,,, 


ex 


Figure 1.10 


where X is the transform of the original sequence x,, as defined in (1.41). In other 
words, we have shown that 

Z(Xp41) = 2X — 2X 
where x, is the initial value of x,. 

Thus, apart from the term —zx), shifting a sequence one place to the left is 
equivalent to multiplying its transform by z. This is our promised key, which 
immediately opens the door to solving first-order equations. 


@ EXAMPLE 1.16 


Let's return to the equation 
Xie aX, k=0,1,2,3,... (1.47) 


first seen in (1.14). The transform of the left-hand side has just been obtained in 
(1.46), so the transform of the complete equation (1.47) is 


2X- 2%) = aX (1.48) 


30 
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where X is the transform of x, — that is, the transform of the solution of (1.47). 
However, (1.48) is now a purely algebraic equation which we rearrange very 
easily to give 


(z- a)X= 2x 
so that (assuming z# a) we have 


2X, 
Xen 
z-a 


From Table 1.1, the sequence is therefore 

xX = ax 
which agrees with what we found before. So far, nothing much seems to have 
been gained. However, what we have done is changed a linear difference 
equation to be solved for x, into a linear algebraic equation to be solved for X. 
This is the promised simplification which the z-transform technique produces. 


It's when we take more complicated difference equations that the benefits really 
start to flow. For example, let's try solving 


Xo 3X, = 4" (1.49) 


Taking the z-transform of each side gives 


pone 
(zX- zx) aXe 


where we have obtained the transform of 4* from Table 1.1. Rearranging the 
terms produces 


Lol z 
z-3 (z-4)(z-3) 
and by again referring to Table 1.1 we immediately obtain 


X= 3'x + (4*-3") 


X= 


as the solution of the equation (1.49). Incidentally, obtaining a sequence from 
its transform is called inverting the transform, and x, is the inverse of X. 
If the right-hand side in (1.49) is replaced by 3°, the transformed equation is 


x 2X 


z-3 
leading to 
D fais res 
z-3 (z-3)? 


and inverting this transform with the aid of Table 1.1 gives 
X= 34x + k3* 


The crucial point to realize is that the procedure is purely mechanical — there is 
no need to try and ‘guess’ what the extra term is in the solution of the difference 
equation according to a given right-hand side. Simply do some algebraic 
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manipulations to get X and then use the table of transforms to find the 
corresponding x. 


EXERCISE 1.27 Use the z-transform to solve the two difference equations in Exercise 
113; 


Sometimes the algebra needed is a bit more complicated, as we now see, 


@ EXAMPLE 1.17 


Return to the equation 
Xe =I tk, Xy=4 


previously seen in Exercise 1.16(a). Taking the transform of both sides gives 


zX- 4z=3X+—2 
(z-1)? 
so that 
ieee eer z (1.50) 


2-3 (z-1)%(z-3) 


The inverse of the first term is 4(3)*, but the second term on the right-hand side 
in (1.50) does not appear in Table 1.1. One way round the difficulty would be to 
get hold of a book containing a more comprehensive table of transforms! If this 
is not possible, we need to use what is called ‘the method of partial fractions’. 
This consists of breaking up the nasty term in (1.50) into simpler fractions 
which are listed in the table. 

First write 


z =2| a Oe “A (1.51) 
(z-1)*(z-3) | (z-1)? 2-1 2-3 


where a, b and care constants to be determined. You should notice two things 
about (1.51). First, a factor zis kept outside the square brackets because all the 
transforms in Table 1.1 contain a factor z in the numerator; secondly, it is 
necessary to include a term over (z- 1) as well as one over (z- 1)’, as we'll see 
shortly. The terms inside the square brackets are the partial fractions which 
when added together give the left-hand side, that is 


1 a b c 
= - ‘ (1.52) 

(z-1)%(z-3)  (z-1)? Z-1 2-3 
You may have spotted that (1.52) contains an identity sign instead of an 
‘equals’. This is because one side is simply a rearrangement of the other. We 
can therefore multiply both sides by (z- 1)?(z—3) so as to remove fractions, 
giving 

1 = a(z—3)+ b(z—1)(z-3) + e(z-1)? (1.53) 
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Remember that (1.53) is an identity, so in particular it holds for any values of z. 
We choose values which make some terms zero: clearly z= 1 reduces (1.53) to 


1=a(1-3) 
so that a=-}, and similarly setting z=3 gives 
1 = c(3-1)? 


so that c=}. Finally, to get the value of b, we look at the term in z? in (1.53): 
this is (b+c)z? on the right and zero on the left, so b+c=0 and hence 
b=-c=-j}. This shows that we couldn't have started with b=0 in (1.51). 
Putting these values for a, b and c back into (1.51) gives us 


1 1 
Zz 77 az Zz 


= - + 
(z-1)°(z-3) (2-1)? z-1 2-3 
Inverting this transform is now possible because all the terms appear in Table 
1.1, giving 
~2k- 4+ 4(3)* 


This completes the solution of the original difference equation, and agrees with 
that given for Exercise 1.16(a). 


EXERCISE 1.28 
(a) Find the inverse of the transform 


2(2z-3) 
(z+2)(z-1)? 


Use this result to solve the difference equation in Exercise 1.16(b). 
(b) Show that 


2z(2z— 1) 
(-1) 


Use this result to solve the difference equation in Exercise 1.16(c). 


Z(k? + 3k) = 


Let’s now move on to second-order difference equations. Obviously we’re going 
to need the z-transform of the sequence x,,,. Remembering that we start counting at 
k=0, the definition (1.41) gives 


=2X—-27x9- 2x, (1.54) 


where as usual X is the transform of the original sequence x,, whose first two values 
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are x, and x,. It is interesting to see that the transform of the second difference 
involves multiplication of X by z*, just as the transform of the first difference 
involved zX. This correspondence can be continued, as you may care to try in the 
following exercise. 


EXERCISE 1.29 Show that the z-transform of the sequence x,,,, kK=0, 1, 2, 3,..., is 
2X — 2X9 — 2°x, — 2X 


where Z(x,) =X. 


We can now apply (1.54) to the solution of second-order difference equations. 


@ EXAMPLE 1.18 


Let's go back to the equation (1.28), namely 
Xpa2t 5X41 +6X,=0 

subject to x)=2, x, =3. Taking the z-transform produces 
(z?X-2z?-3z)+5(zX-2z)+6X=0 

which can be rearranged as 
(z?+5z+6)X=227+ 13z 

so that 


Bee Se 
(z+2)(z+3) (z+2)(z+3) 


Using the last two entries in Table 1.1, the sequence corresponding to X is 
ke ke k k 
40) ee 
=[2(-2)**" + 13(-2)'] - [2(-3)!*" + 13(-3)4] 
= (13 - 4)(-2)*- (13 - 6)(-3)* 
= 9(-2)*-7(-3)* 
which agrees with the result found earlier in Example 1.7. 


EXERCISE 1.30 Repeat Exercise 1.18 using the z-transform and the results in Exercise 
1,25(a) and (b). 


@ EXAMPLE 1.19 


Now return to 
Xee2 + bX, +9%,=0, XM =—-1, X,=1 

considered earlier in Example 1.9. Applying (1.54) gives 
(z?X+z?— z)+6(zX+ z)+9X=0 
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which after some algebraic manipulation becomes 
_ 72? -52z 
(z+3)? 


We are not quite ready to use Table 1.1, since it doesn’t contain a term of 
the type z?/(z— c)?. However, if we write (1.55) as 


(1.55) 


_ -2?-3z-22z 
(z+3)? 
= 72(z+3)-2z 
(z+3)? 
Cy eee! 5 
z+3  (z+3)? 
then we can read off from Table 1.1 
X= —(—3)*+ 3 k(-3)* 
which is the result obtained earlier in Example 1.9. 


EXERCISE 1.31 Determine the values of a, b and c such that 


2 
Eko} ac pene ey el 


(2+5)z-1)? @-1)? 2-1 245 


Use this result to solve the equation (1.37) subject to the initial conditions x) =1, 
x, = 2. Compare your answer to that in Example 1.11. 


The preceding examples and exercises should have given you the flavour of the z- 
transform method for solving linear difference equations. It’s worth repeating that the 
essential idea is to transform a difference equation into an algebraic equation, whose 
solution is then ‘inverted’ using a table of transforms to give the desired solution of the 
difference equation. It’s important to realize that the solution of the algebraic equation 
is a routine operation: there is no need to make special provision for awkward cases, as 
was required in the procedures of the previous section. Indeed, if you have access to a 
computer algebra package then doing the algebra is completely painless! However, the 
aim of this section is not to make you an expert at solving difference equations — after 
all, there are entire books devoted to z-transforms and their applications, which 
incidentally cover a lot more than merely solving difference equations. 

The general concept of ‘transforms’ is a powerful one in mathematics, and the 
corresponding method for solving linear differential equations, called the Laplace 
transform, has many similarities to the z-transform. 


EXERCISE 1.32 Verify the identity 
2z  _ xz+1) 4 


@-1) @-1" @=1? 
Use this to find the solution of (1.36) subject to a=2, x» =0, x, =0. (Compare with 
the second result in Exercise 1,21.) 
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EXERCISE 1.33 Retum to the problem of the roll of kitchen foil in Exercise 1.17. Solve 
the difference equation in part (a) using the z-transform. (Hint: find constants a and b 
such that X = az/(z— 1)? + bz(z+1)/(z-1)*.) 


1.4 MATRIX MODELS 


In Examples 1.4 and 1.5 in Section 1.1 we saw how the notation of matrix algebra 
could be used to describe some discrete time models. We now develop this idea 
further for some more complicated problems, and show how relevant properties of 
matrices can be utilized. 


@ EXAMPLE 1.20 


Population models are a particular favourite in this area. Let's consider one 
which traces the numbers of females in a population of blue whales. The 
females are divided up into four age groups, and the time period is taken to be 
4 years. We use x(k), i=1,2,3,4, k=0,1,2,..., to denote the number of 
females in age group / at the beginning of the kth period. Ecological studies 
have shown that the mortality rate over a 4 year period is 43% for all age 
groups. The studies also found that the females do not give birth until they are 
at least 4 years old, and the average numbers of female calves born to a female 
in group j over a 4 year period are as follows: 


Age group i 2 3 4 


Age in years 4-7 8-11 12-15 
Average number of calves b, 0.63 1.00 0.90 


Age group 1 consists of females aged 0-3 years. After k+1 time periods have 
passed, the number of females x,(k+ 1) in this group must be composed of the 
calves born in the previous time period. From the table we see that 0.63 x,(k) 
calves are born to females in group 2, 1.00x,(k) calves are born to females in 
group 3 and 0.90x,(k) to females in group 4. We therefore have 


X,(k+ 1) = 0.63 x, (k) + X_(k) + 0.90.x, (k) (1.56) 


Now move to the females in group 2. These simply consist of those 57% in 
group 1 who survive from the previous period, so that 


%(k+ 1) = 0.57 x, (k) 
The same argument applies to the next age group, giving 
X,(k+ 1) = 0.57 x, (k) 
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Finally, assuming that no female lives longer than four time periods (16 
years) it follows similarly that the oldest age group satisfies 


X,(k+ 1) = 0.57 x(k) 


We can now write the above four equations in the following combined form: 


x(k+ 1) 0 0.63 1 0.901] x,(k) 
x(k +1) 0.57 0 0 0 || x(k 
2 iB Ald (1.57) 
Xg(k + 1) 0 057 0 0 || Xa(k) 
x,(k+1) 0 0 0.57 0 || x(k) 
or simply 
x(k+1)= Ax(k) (1.58) 


where A is the 4x 4 matrix and x(k) is the 4x 1 column vector on the right-hand 
side in (1.57). The elements of A are obtained by picking out the appropriate 
coefficients in the equations. Specifically, if the ith row of A (i=1,2,3, 4) is 
denoted by 4;,, a2, 43, 4),, then the ith row of the product Ax(k) is 


jy X (kK) + jp Xp K) + Bg XQ K) + Aja X4(K) (1.59) 


For example, comparing (1.59) with (1.56) reveals that when j=1 then a,,=0, 
4, = 0.63, a3=1, a4=0.90 and these elements comprise the first row of A in 
(1.57) 


The equation (1.58) is called a matrix difference equation; in general A can be a 


square nxn matrix having n rows and n columns, in which case x(k) is a column 
vector with n components. When n=1 the equation (1.58) reduces to the scalar 
equations first seen at the beginning of the chapter in (1.2); the case n=2 appeared 
in (1.12) and (1.13). 


EXERCISE 1.34 In a model of a redwood forest the trees are divided into three age groups: 


(1) young trees aged 0 to 200 years; 
(2) mature trees aged 200 to 800 years; 
(3) old trees aged more than 800 years. 


The unit of time k is taken to be 50 years. It is assumed that the trees are uniformly 
distributed in age throughout each group. Thus, for example, one-quarter of trees in 
group 1 move into group 2 every 50 years. 

In the absence of felling it is reasonable to assume that redwoods die only of old 
age, and in every 50 year period it is found that one-third of trees in group 3 die. 
Observations have also shown that in each 50 year period: 


each tree in group 1 produces on average 10.25 new trees; 
each tree in group 2 produces on average 25 new trees; 
each tree in group 3 produces on average 5 new trees. 
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Let x,(k), i=1,2,3, denote the number of trees in group i at the start of the Ath 
period. Show that 


xy(k+ 1) 11 25 5 |} x(k) 
x(k+1)]=| 4+ 2 Off x(k) 


x(k+1)| | 0 x4(k) 


Pons 


We can now give the full explanation of what is meant by the product of a 
matrix A and a column vector x(k). The expression (1.59) is a special case of the 
general rule. This states that if the element in row i and column j of A is denoted by 
a,, and if the n components of x(k) are x, (k), X2(k), ..., X,(k), then the product 
Ax(k) is also a column vector whose ith component is 


Gj, X, (K) + jn Xz(K) + +++ + GinX, (kK), i=1,2,...,0 (1.60) 


What (1.60) says is that to get the ith term in the product Ax(k) you take the ith row 
of A and multiply it term by term with the elements in x(k) — that is, the first 
element in the row of A is multiplied by the first element in the column vector, then 
the second elements are multiplied together, and so on. 


EXERCISE 1.35 Evaluate the following products using (1.60): 


(a)}1 -1 4]/1 
ee Sin )I3 
0 —6 1)(2 
0) eS | 
i NIN scs(@oe¥ || 
Oe t= 1h =3 
a 0 6: 4 


@ EXAMPLE 1.21 


In a certain animal population the oldest age attained by females is 15 years. 
Suppose that the population is divided into three age groups each of duration 5 
years, with x(k), i=1,2,3, denoting the number of females in group / at the 
beginning of the kth period. Suppose the equation (1.58) in this case has 


03 4 x(k) 
A=|1 0 0}, xk)=| x(k) 
o}0 x(k) 


and that initially (k=0) there are respectively 1000, 900 and 800 females in each 
of the three age groups. After 5 years (i.e. one time period) have elapsed the 
numbers in each age group are given by substituting k=0 into (1.58) to give 


x(1) = Ax(0) 
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Writing this out in full produces 


(1) P4000 
x) =A 3 
alt)| | 800 
0 x 1000 +3 x 900 + 4 x 800 


=| 1 x 1000 +0 x 900 + 0 x 800 
0x 1000 + } x 900 + 0 x 800 


5900 
=| 250 
450 


where we have used the multiplication rule in (1.60) with n=3. Thus after 5 
years there are 5900 females aged between 0 and 5 years, 250 between 5 and 
10 years and 450 between 10 and 15 years. 


EXERCISE 1.36 In the preceding example, determine the numbers of females in each of 
the age groups after a further 5 and 10 years have elapsed by substituting k= 1 and 
k=2 into (1.58). 


The procedure used to solve Example 1.21 and Exercise 1.36 can be followed to 
solve (1.58) when A is a general nxn matrix. Simply substitute successive values 
for k into (1.58) starting at zero, just as we did for the scalar equation at the 
beginning of the chapter. We get 


k=0: x(1)=Ax(0) 
k=1: x(2)=Ax(1)=A*x(0) 
and continuing this process shows that the solution of (1.58) is 
x= AO), “kel, 2,3,.... (1.61) 


which is the matrix version of (1.3). Remember, however, that when we looked at 
the scalar solution in more detail in Section 1.2, we had to be careful about what 
happens to the solution x(k) as k becomes very large. We saw when n= 1, in which 
case A in (1.61) is a scalar a, then a* 0 as k> if |a|<1, and a‘ as 
k—> if |a|>1. We need to find what replaces this modulus |a| of a real or 
complex number for the matrix equation case. The easiest situation to deal with is 
when A is a diagonal matrix — that is, the only non-zero entries in A are on the 
principal diagonal (northwest to southeast). For example, a 3 x 3 diagonal matrix is 


A=|0 a 0 (1.62) 
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Multiplying A in (1.62) by itself gives 


a 0 Ojja 0 0 
A°?=AA=|0 a, 0110 a, 0 
0 0 allo 0 a, 


a@0)30 
=|0 a 0 (1.63) 
00a 


In obtaining (1.63) we have used the following rule for multiplying matrices, which 
is really the same as our earlier rule in (1.60) for multiplying together a matrix and a 
vector. 


We simply multiply the first matrix with each of the columns of the second 
matrix in turn. 


For example, with two general 3 x 3 matrices A and B we have 


%, 4 4311 Di bis 
AB =| ay, 4) 43 || Dy, Ban dyg 
43, 43, 3 |] D3, yng 


(fama apa) 


Ist 2nd 3rd 
column column column 


The first column of the product is A times the first column of B, and according to 
(1.60) this is 


by by + Aybyy + 44305, 
Al bay | =| Qn) by, + Agabo1 + 4331 (1.64) 
by 31by, + Aaybp1 + 433031 


where we have formed the term-by-term product of each of the rows of A with the 
first column of B. 

The second and third columns of AB are found in an exactly similar way to 
(1.64), using the second and third columns of B. 


EXERCISE 1.37 Use the rule (1.64) to evaluate the matrix products 


(@—) 1-1. 4//1 0.6 
2) gi de Oil: =5 yeh 
OG F249 
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(i= 24s Oill 210) 4g 

3 ib ef 3 1 = 6 ll 

E10) 2 1) 35 — 18 0 

=) 0 6) 4 OD=19 9 
Compare your results with Exercise 1.35. 


Now let’s return to the diagonal matrix A in (1.62). Continuing as in (1.63) you 
can easily check that 


q 0 0 
A’ =A.A>= a 0 
0 a 
and in general 
a 0 0 
Ak=|0 as O|, k=2,3,4,... 
0) Oy ray 


With n= 3, the solution (1.61) is therefore 
x (k) x, (0) 
x(k) | =A" x,(0) 
x(k) x;(0) 


atx, (0) 
= a}x,(0) (1.65) 
akx, (0) 


again using the rule (1.64), Equating components on each side of (1.65) shows that 
(R= afe(0)) d= 1,2,3: 
This is now in exactly the same form as the solution (1.3) for the scalar case, and we 
can therefore conclude that as k — 9 
x(k) 0, if Ja,|<1 


1.6 
x(k) 30, if |a)|>1 (1.66) 


Although we have only worked with the 3 x 3 matrix in (1.62), you should be able to 
see that the argument used to derive (1.66) holds for any value of n when A is a 
diagonal matrix with diagonal entries a,, a), ..., a,- 
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To handle the situation when A is an arbitrary matrix, not a diagonal form, it’s 
necessary to delve into a bit of matrix theory. As usual in this book, we’ll keep the 
treatment informal — for full details and a mathematically rigorous coverage you'll 
have to go to an appropriate book in the reading list at the end of the chapter. The 
basic idea is to change the coordinates in our difference equation (1.58) from 
X 5 Xp, -++4X, tO Y,, Yo, +++» Y, according to a linear relationship — that is, each x; is a 
linear combination of y, which means, for example, that 


X= ta thee to thn (1.67) 


where the ¢s are constants (notice that for simplicity of notation we have temporarily 
suppressed the dependence of the xs and the ys on the variable k). Putting together 
the expressions like (1.67) for each of the xs, we get x= Ty where T is the nxn 
matrix whose element in row i, column j is f,, and x and y are column vectors with 
components x,,.--,%, and y,,---,¥, respectively. To express y in terms of x, we 
would somehow like to ‘divide’ by T, to obtain y=x+T. The way we do this is to 
define the inverse T~' of T. This matrix satisfies the conditions 


i oy Nae BE? f 


where / is the nxn unit matrix, which has diagonal form like (1.62) with ones all 
along the diagonal. You can easily check using our multiplication rule that 


XI=IX=X 


for any nx n matrix X. This accounts for the name ‘unit matrix’, since / behaves for 
matrices in the same way that the unit 1 behaves for scalars. Similarly, the name 
‘inverse’ of T is used because T~' plays exactly the same role for matrices as does 
the inverse ¢~! (= 1/t) of a scalar ft. We can therefore rearrange x= Ty to obtain 


y= Tx 


(be careful to write the inverse on the correct side: x7 ~' does not make sense). 


EXERCISE 1.38 Verify in each of the following cases that AB= BA=I, where / is the 
2x2 unit matrix. This shows that B= A~'. 


a\s° 2 = i | 
of th e[a 3] 
waa[f th aeaf 2 FL 
la My) 3 4/7) 3 


Let’s now see what happens to 
x(k+ 1) = Ax(k) 

when we change the variables from x to y, Firstly we have 
x(k+ 1)=ATy(k) 


food F tend od 
as Sy 
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and then using the argument above gives 
y(k+1)=T"'x(k+ 1) 
= T"'ATy(k) (1.68) 


This difference equation in the new variables y, (k), y2(k), ..., ¥,(k) can therefore be 
written 


y(k+1)=Dy(k), k=0,1,2,... (1.69) 


where D =T~'AT. The crucial fact which is relevant for us is that in most cases of 
practical interest it’s possible to find a matrix T which possesses an inverse T~' and 
is such that D is diagonal, that is 


ZG. 0 
TUAR=D a|0 2 coe O (1.70) 
WoW ee 


n 


Although 7 is not unique, the non-zero elements along the diagonal of D are 
unique for a given matrix A, although for different 7's they may come out in 
different orders. These numbers A,, A,,...,4, are therefore specific to a given 
matrix A and are called the eigenvalues of A. Words involving ‘eigen-’ are rather 
ugly Anglo-German combinations but are widely used — the term characteristic 
roots of A is an alternative expression. Since y(k) satisfies (1.69), where D is the 
diagonal matrix in (1.70), we know from (1.66) that the behaviour of y(k) as 
koois 


y(k) 0, if Aj; <1 
yi(k) 0, if JA;|>1 


Recall that each component x;(k) is a linear combination of all the components of 
y(k), as displayed in (1.67) for x,(k). It therefore follows that as k—> oo we must 
have all the y;(k) — 0 in order to make each x;(k) —> 0. That is, as ke we have 


x(k)-—0 provided all |A;| <1 (1.71) 


Similarly, if all |A;| >1 then x(k) > e as ke, You might like to think about what 
happens if some of the A; have modulus less than one, and the remainder have 
modulus greater than one. 

In order to apply the result (1.71), we need to be able to compute these 
mysterious quantities called eigenvalues for any given matrix A. This is a subject 
which can and does fill whole books! Nevertheless, let’s see how far we can get 
without becoming too technical. The key equation is (1.70), called diagonalization 
of A, whereby A is converted into a diagonal matrix D according to 


T"'AT=D 


Matrix Models : 43 
Multiplying this (on the left) by T gives 
AT =TD (1.72) 


since TT~'=/ and JA=A. Suppose the matrix T has columns f,, t,..., t,. For 
simplicity consider n= 2, so (1.72) becomes 


ait 2) ti t2]/4 0 
hy be ty by |] 0 Ay 


Arti Aahe 
Aiby Aabo 
using the multiplication rule given earlier in (1.60). Equate the first columns on each 
side of this equation to get 


t Ait 
Aleit [arias 
i rel 


At, =A,t, 


using the fact that multiplying a vector by a scalar means that each element is 
multiplied by the scalar. Similarly using the second columns gives At, = jh. 
Generalizing this approach, when A is an nxn matrix we get n equations of the 
form 


Az=Az (1.73) 
to be solved for the scalar A and the column vector z. In general there will be n such 


values of A, called eigenvalues, and for each value there will be a corresponding 
vector z called an eigenvector (these, however, are not unique). 


or 


@ EXAMPLE 1.22 
Let’s solve (1.73) for n=2 in the following case: 
1 3iiz)_j)4 
allele] 
A z z 


Using the multiplication rule (1.60), this becomes 


2 +3z, |_| Az, 
2z,+2z,| | Az 
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Equating terms on each side gives us 
2, 4+32Z,=AzZ, 22,+22z,=Az, 
or, after rearrangement, 
A-1)z,-3z,=0 
ail (1.74) 
-22z,+(A-2)z,=0 


We have two simultaneous equations which involve three unknowns A, z, and 
Z, — we'll see that this means the solution is not unique. Let's eliminate z, by 
multiplying the first equation in (1.74) by (A-2) and the second equation by 3: 

(A-2)(A-1)z, -3(A-2)z,=0 

-6z, +3(A-2)z,=0 

Now add these equations to obtain 

[(A-2)(A-1)-6]z, =0 (1.75) 
We cannot have z, =0, for if this were the case the first equation in (1.74) would 
give z,=0 as well —- completely uninteresting! We therefore conclude that the 
content of the square bracket in (1.75) is zero, that is 

4?-31-4=0 (1.76) 
or 

(A+1)(A-4)=0 
This shows that the two eigenvalues of A are the roots of the quadratic 
equation (1.76), namely A, =-1, 4, = 4. We now go back to (1.74) and solve the 
equations for z, and z, using each of these two values of 4. With A= —1, (1.74) 
becomes 

-2z,-3z,=0 

-2z,-3z,=0 (1.77) 
You can see that a convenient solution of (1.77) is z,=3,z,=-2. However, as 
mentioned above, the solution of (1.77) is not unique — clearly z,=3p, z,=-2p 
is also a solution for any scalar p. 

When A= 4 the equations (1.74) become 

3z,-3z,=0 

-22z,+2z,=0 (1.78) 
for which a convenient solution is z,=1, z,=1. We have therefore found 
eigenvectors 


Flag 


corresponding to the eigenvalues A,, A, respectively. These eigenvectors are the 
columns of the matrix Tin (1.70), which is therefore 


|| eel 
T=| 3 1 (1.79) 
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To complete (1.70) we need the inverse 7~', and it’s useful to quote here the 
formula for the inverse of any 2 x 2 matrix: 


a Bit d -b 
E | Patel | 80) 


provided ad- bc+0. To verify that (1.80) is correct, simply multiply the matrix 
and its inverse together, using the rule in (1.64): 


ele alte aaaccel “s 


1 ee set 


“ad-be|cd-de -cb+ad 
sal ad-—be 0 
ad-be 0} -cb+ad 


as required. Two examples of (1.80) were given in Exercise 1.38. We can now 
apply (1.80) to (1.79) to get 


=. 1 1-1 
caeae 4 


aif -1 
5: (2 3 
When T in (1.79) and its inverse above are put into (1.70) you should verify by 
working out the products that we do indeed get the diagonal matrix D: 


stgre[-1 9]_[a 0 
if ar-| 5 lr[e h (1.81) 


Having gone through this example in some detail, we can now expand on a few 
points which will enable us to solve any 2 x 2 example rather more succinctly. First, 
notice the condition that (1.80) must not have a zero denominator: the quantity 
ad — bc is called the determinant of the 2 x 2 matrix 


alae 
since it determines whether this matrix has an inverse, and is written det X or |X|. 
Secondly, the equation (1.73) can be written as 

(AI- A)z=0 (1.82) 
since Az = AIz. Moreover, the condition that z # 0 can be shown to require that 

det(AJ-— A) =0 (1.83) 
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As an illustration, consider the 2 x 2 matrix A in Example 1.22 for which 


1 0 jake 
—A=J es 
u-andy {I-[2 3] 
ali@ab pod 
—2 A-2 
Hence (1.83) gives 
0=(A-1)(A-2)- (-3)(-2) 
=A?-3A-4 
which is precisely what we found before in (1.76). The eigenvalues of A can 
therefore be found by solving (1.83), which is called the characteristic equation of 
A (the polynomial itself is called the characteristic polynomial of A). When A is a 
general nx n matrix (1.83) still applies, and the characteristic polynomial has degree 
n. However, we shan’t go into details of how to work out the determinant of a 
general square matrix, although we shall consider shortly the cases n=3 and n=4, 
Finally, there’s no need to worry about the non-unique solution for an 
eigenvector — whatever you choose, the diagonal expression 7~'AT will still come 
out right. For example, suppose that in Example 1.22 we had chosen z, = 6, z, = —4 


as a solution of (1.77), and z, =3, z,=3 as a solution of (1.78). The new matrix T 
would then be 


=|. Ores 
[43] 
with inverse given by (1.80) as 


-1_ 1 3 -3 
x el 3] 


1 1 


_|10 ~i0 
gprs 
15 5 


You should verify by multiplying out the product that we still get 


Tr lar=|~1 0 
0 4 


as in (1.81). 


EXERCISE 1.39 Determine the eigenvalues and associated eigenvectors for the matrix 


Hence obtain T and 7~' in (1.70), and verify that T-'AT =D. 
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Remember that an eigenvector z linked to a particular eigenvalue 4 satisfies 
(1.73), namely Az = Az. Multiplying both sides by A produces 


A?z=AAz=A(Az) = 472 
Continuing this process shows that A*z= A*z and in general 
A‘z=A*z (1.84) 
for any positive integer k. Recall also that the solution of the matrix difference 
equation (1.58), namely 
x(k+1)=Ax(k), k=1,2,3,... 
was obtained in (1.61) as x(k) = A*x(0). It’s interesting to realize that if the initial 
vector x(0) is an eigenvector z of A, then using (1.84) shows that the solution is 
x=Az=a Az 
Moreover, if this particular eigenvalue has the value 1 then 4*=1, and the solution is 


simply x(k) =z: that is, the vector x(k) remains at its initial value x(0)=z for all 
subsequent time. This is called an equilibrium situation. 


@ EXAMPLE 1.23 


A local car rental company has two offices in neighbouring cities B and L. It is 
known on the basis of past experience that on a monthly basis 40% of rentals 
from the office in B are returned there and 60% are one-way rentals which are 
dropped off in L. Similarly, 70% of rentals from the office in L are returned there, 
whereas 30% are dropped off in B. The company operates a fleet of 90 cars, and 
would like to know the numbers which should be kept at each city depot. 

Let x,, y, denote the number of cars at the depots in B and L respectively at 
the beginning of month k, for k=0, 1, 2, .... One month later the cars at B consist 
of those returned there during the previous month (namely 40% of x,), together 
with those dropped off on a one-way rental from L (i.e. 30% of y,) so we have 


Xu1 = 0.4%, 40.3y,, k=0,1,2,... 
Similarly, for the depot at L 
Ves = 0.6%, + 0.7% 


carsfromB cars returned toL 


We can write these equations in the combined form 


Xr] [0.4 0.3][ x, 
= , k=0,1,2,... (1.85) 
fee lee 0.7 || Ve 

A 


which is precisely our standard equation X(k+ 1) = AX(k) with 


x 
Xlki=|—* 
w-[%] 


ke 
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Let's find the eigenvalues of A using (1.83), which gives 


0:4) 0-0! 
0 =det(al- A)= det|*- sce a 
=(A —0.4)(A — 0.7) — (-0.3)(-0.6) 
=A? —1.144+0.1 
=(A- 1)(A - 0.1) 
Hence A has eigenvalues 4,=1, 4,=0.1. An eigenvector for A=1 is found by 
solving (1.82), which becomes 


0.6 walle 2 2] 
-0.6 0.3||z| |0 
(Asl= Al, ane. 
On multiplying this out, we get the same equation twice, namely 
0.62, - 0.3z,=0 


so we can take z, = p, z,=2p for any non-zero scalar p as the components of an 
eigenvector for the eigenvalue A=1. We therefore conclude from our discus- 
sion above that if we start with x,=p, y=2p then x%=p, y%=2p for all 
subsequent time, k=1,2,3,.... Since the total number of cars is p+2p=90 we 
have p=30, so the company should begin with 30 cars at B and 60 at L; these 
numbers will then be the same at the beginning of every subsequent month 
(assuming that the previous pattern of customer usage does not alter) and this 
is an equilibrium situation. 


EXERCISE 1.40 Verify by substituting x) =p, yo =2p into (1.85) that x,=p, »,=2p for 


all values of k> 1. 


EXERCISE 1.41 If the matrix A in (1.85) is changed to 


| 


show that it still has an eigenvalue equal to 1. 
Determine the numbers of cars to be kept at B and L for the equilibrium situation 
in this case. 


i= sie 
win wl 


EXERCISE 1.42 Show that any matrix of the form 


Pemer 


where a and f are positive scalars, has an eigenvalue equal to 1. 


We’ve seen that when n= 2 the determinant in (1.83) gives a quadratic equation 


to be solved for A. It turns out that when n=3 we get a cubic equation in A, when 
n=4 a fourth-degree equation, and so on. Let’s consider the barest outline of how a 
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3 x3 determinant can be worked out. Let x, denote the element in row i, column j of 


an arbitrary matrix X. We already know that for n=2 


ai 


= 2) — = 
det X= =X 1X02 — Xy2%21 


Ray 


When n=3, det X is defined in terms of three 2 x 2 determinants: 
“1° 42 713 
det X=|xXj, X22 X43 


%3) X32 X33 


=X Xy— XpXy2+% Xs 


*31 00 
x 


Xie 
» Ag 
31 32 


To form: X;,, we delete row 1, column 1 in det X 
X,, we delete row 1, column 2 in det X 
X,, we delete row 1, column 3 in det X. 


(1.86) 


(1.87) 


Analogous formulae can be given for n>4, but for further explanations and proofs 
you’ll have to consult an appropriate book from the list at the end of the chapter. In 
fact, for n>4 evaluating determinants using formulae like (1.86) is not recom- 
mended — a much better method relies on what is called ‘gaussian elimination’ (see 


Section 3.5, Chapter 3). 


@ EXAMPLE 1.24 


We'll find eigenvalues and corresponding eigenvectors for the matrix 


ies 


0-1 1 
Z4psie St 
2 

Write X for the matrix 
ALO 10: 


0A 0 
0oo;A 


al-A= -A 
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We evaluate det X= det(A/— A) using (1.86) as follows: 


Seine 
detX=|-2 4-3 -3 
eh wes 
phe) eRoes ea inpeailso fee 
=| = P| ie gaa oo “ll 
Xy Xe Xa 


=Al(A — 3)(A — 1) — (-3)(—1)] - [-2(A - 1) - (-3)2] - {(-2)(-1) - 2(A - 3)] 

=A(A? - 4A) - (-24 + 8) - (-24-8) 

=A5- 447+ 42-16 (1.88) 
By trying simple values you can check that A= 4 is a root of this polynomial, so 
det X has a factor A- 4. Dividing the polynomial in (1.88) by this factor produces 

det(A/— A) = (A- 4)(A?+ 4) 
= (A-4)(A+2i)(A-2i) 

Hence the eigenvalues of A are A, = 4, A, =2i, A, = —-2i. If Ais a real matrix, as is 
almost invariably the case in practical applications, then any complex 
eigenvalues always occur in complex conjugate pairs in the form a+if. 


To find an eigenvector corresponding to j,=4, we solve the equation 
(1.82), which here is 


a 1 =t)fx] fo 

-—2 01 «~-3))/z)/=/0 

2-1 Siig 0 
Al-A z 


Working out this product using the multiplication rule (1.60), we get 
42z,+%-%=0 
-22,+%-32,=0 
22z,-2%+3z,=0 


Again the solution of these equations is not unique, since you can see that the 
last two equations are identical. Subtract the second equation from the first to 
get 


6z,+22z,=0 
for which a simple solution is z, = 1, z,=—3. From the first equation 
2=-42,+4=-4-3=-7 


so an eigenvector for A, is 


FI 
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EXERCISE 1.43 Determine eigenvectors corresponding to A, and A, for the matrix A in 
Example 1.24, 


We will not develop any further the question of computation of eigenvalues 
of a matrix. Not only is powerful software available for this purpose on micro- 
computers, but also on some hand-held graphics calculators. It is interesting, 
however, to return to our population models. If you go back to Examples 1.20 
and 1.21 at the beginning of this section you will see that in each case the matrix 
A has the form 


Gy We ae Gyo Ay 
beOumOpeee 0, 0) 

Eat b O on 0 0 (1.89) 
ee eee 


where a; 0 and 1 > b; >0 for i= 1, 2, 3,.... 

The matrix (1.89) is called a Leslie matrix, and we can generalize our models of 
female populations given in Examples 1.20 and 1.21 in the following way. Divide up 
the population into n age groups of equal duration. If the maximum age attained by 
any females is K years, then the time period has length K/n years. For example, in 
our blue whale model in Example 1.20 we had K=16, n=4 and the length of the 
time period was 4 years. Let x;(k) denote the number of females in group i at time k, 
and interpret the parameters in the matrix L in (1.89) as: 


a, =the average number of daughters born to a female during the time she is in 
the ith age group 

b, =the fraction of females in the ith age group which survive to pass into the 
(i+ 1)th group 


Using exactly the same argument as in our two particular cases in Examples 1.20 
and 1.21 we can construct the equations which describe the way the population 
behaves. First, x,(k+ 1) is equal to the total of the daughters born to females in all 
the age groups over the previous time period, so that 


X, (K+ 1) = a,x, (k) + a,X,(k) + +++ + ,X, (kK) (1.90) 


Secondly, the number of females in group i+1 at time k+1 is equal to the 
proportion of group i surviving from the previous time period, so that 


Xi (k+1=bx(k), i=1,2,3,...,n-1 (1.91) 
Combining together the m equations represented by (1.90) and (1.91) gives 
X(k+1)=LX(k) (1.92) 


where X(k) has components x,(k), x,(k), ...,x,(k), and L is the Leslie matrix in 
(1.89). 
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The Leslie matrix has some interesting properties which we can use to 
investigate the general behaviour of the solution of the population equation (1.92). 
Of course, since this equation has our usual standard form we know from (1.61) that 
the solution of (1.92) is 


KOHL XO), kal 0. 3).2. (1.93) 


We have seen that the expression (1.93) will depend upon the eigenvalues and 
eigenvectors of L; an important fact which can be proved is that any matrix L in the 
form (1.84) always has at least one positive eigenvalue, which we shall denote by 
A,, and this is unique (i.e. it occurs only once as a root of the characteristic equation 
of L). 


@ EXAMPLE 1.25 


Consider the case n=3, when 


a, 4 43 
L=|b, 0 0 (1.94) 
0 1B80 
The characteristic equation of Lis 
0=det(A/— L) 
A-@ -@ —@ 
a apt ee 
0 9-5 A 


= (A= a,)A? + a,(-b,A) - a3b,b, 
= 19 - a,4? - a,b,A - ab, by (1.95) 


where to obtain (1.95) we again used the expressions (1.86) and (1.87). Let A, be 
the positive eigenvalue of Lin (1.94), and consider the product 


1 8; + @yb,/Ay + agb,b,/At 
L] bya, |= b, (1.96) 
b,b,/at by bp/A, 


Since A, satisfies the equation (1.95), we can write 
AS = a,A5 + a,b, A, + a,b, b, 


and dividing both sides of this by 4? shows that the first element on the 
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right-hand side of (1.96) is just A,. The equation (1.96) therefore reduces to 


Lu=A,u (1.97) 
where 
1 
u=| b,/A, (1.98) 
b,b,/At 


showing that u is an eigenvector corresponding to A,. 


EXERCISE 1.44 Verify using (1.95) that the Leslie matrix 

t6 

0 (1.99) 
0 


her 


OosrHo 
(=) 


vis 


has an eigenvalue 4, =}. Determine a corresponding eigenvector using (1.98), and 
verify that it satisfies (1.97). Find also the other two eigenvalues of L. 


We now list some properties of the general n x n Leslie matrix L in (1.89): 
(i) 
det(Al— L) = A"— a,A""! -a,b,A"~? — ayb, BA" ... — a,b, by... By 
The case n=3 is given in (1.95). 
(ii) There is a unique positive eigenvalue A, with corresponding eigenvector 


1 
by/A, 


by balay 
b,babs/AP 


(1.100) 


Bib, ntmaies 


The case n=3 is given in (1.98). 

(iii) If A, is any other eigenvalue (real or complex) of L then |A,| <A,, and A, 
is called the dominant eigenvalue of L. Furthermore, provided that in 
the first row of L there are two successive entries a, and a,,, which are 
both non-zero, then A, is strictly dominant, which means |A,|<4A,, 
i=2,3,...,n. 


The assumption in (iii) is a reasonable one since it requires that there are two 
successive fertile age groups, which will usually be the case in realistic population 
models when the duration of each age group is sufficiently small. 
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The Leslie matrix in (1.99) has a,>0, a,>0 so that the condition in (iii) is 
satisfied. The eigenvalues of L are A, =3, A4,=—1, As = —}, showing that A, is the 
strictly dominant eigenvalue with |A,| > |A21, |Ay1>|Asl- 


We can now apply the properties (i)—(iii) to investigate what happens to the 
solution (1.93) when k becomes large. Suppose that L has been diagonalized as in 


(1.70), where 
TLr=eD 


(1.101) 


and D is the diagonal matrix of eigenvalues of L. Multiply (1.101) on the left by 7 
and on the right by 7~', and remember that TT~'= T~'T = /, to get 


L=TDT™ 
Hence 
Petpr'Tpr™ 
= TDIDT™ 
=TD*T 


and repeating this process gives 
BSTO TS, (ko 2; 354scn 


Since D is a diagonal matrix with A,, A, , 
seen that 


Ara0 20 0 
k 
fay || ari 0} ke 
0 00 os 


n 


The solution in (1.93) therefore becomes 
X(k) = L'X(0) 


= TD'T~'X(0) 
where dividing both sides by A} produces 
1 0 0 
OMA no 


4xw=T|0 0 GA) 
Ay searing i 
0 0 0 


..., A, along the principal diagonal we have 


T'x(0) 
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Assuming A, is the strictly dominant eigenvalue of L we have |A;/A,| <1, so as in 
Section 1.2 it follows that 
A; \k ; 
re >O0askoo, fori=2,3,...,n 
1 


We have therefore shown that as k —> ce 


iy ne © 
+ xH—>7/° 9 - rx) (1.102) 
: oom 


Recall that T is a constant matrix whose columns are eigenvectors of L correspond- 
ing to A,,..., 4,, so in particular the first column of T is the eigenvector u in (1.100). 
Let the first element of the vector T~'X(0) be denoted by c, a constant. Then the 
product on the right-hand side of (1.102) reduces to cu. To see how this works out, 
let’s just look at the case n= 3. The right-hand side in (1.102) is 


1 0 Oe (2 
T)O 0 O}/*|=7)0 
. 0 


000 
where the dots indicate elements whose values don’t matter. If the second and third 
columns in T are u, and u, this product is 


C 
[u, Up, U3]| 0} =cu 
0 


as required. We have therefore ended up with the result that as k > © 


a X(k) > cu 
ay 


We can write this as 
X(k) > cAtu (1.103) 


which means that for large values of k, X(k) behaves like cAtu where c is a constant 
depending upon the initial vector X(0). 

If A, > 1 equation (1.103) shows that the population is eventually increasing; if 
A, <1 the population is eventually decreasing (and so ends up at zero); and if A, =1 
the population eventually stabilizes with zero growth. In this latter case X(k) > cu, 
showing that for sufficiently large k the age distribution becomes a multiple of the 
eigenvector u corresponding to the eigenvalue A= 1. 

Replacing k by k— 1 in (1.103) gives 


X(k-1) 9 cAF'u 
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for large values of k, so we can conclude by comparing with (1.103) that 
X(k) > A, X(k-1) (1.104) 


This means that after a sufficiently long time the age distribution vector X(k) is A, 
times the value X(k—1) for the preceding time period, so that the proportion of 
females in each of the age groups becomes constant. 


@ EXAMPLE 1.27 


Return again to the Leslie matrix (1.99) whose strictly dominant eigenvalue was 
noted in Example 1.26 to be A, =3. The eigenvector u in (1.98) is 


For large values of k we have seen in (1.104) that 
X(k) > 7 X(k-1) 


so after a sufficiently long time (k= N, say) we can say that (to a close 
approximation) 


X(N) = 3 X(N=1) 
and similarly 
X(N+1)=2X(N), X(N+2)=3X(N+1), 


This means that the population growth becomes constant, since during each 
time period the number of females increases by 50%. 
From (1.103) we see that after a long time has elapsed 


Ik) > c(3)* (1.105) 


zl- os = 


which shows that the numbers of females will be distributed among the three 
age groups in the ratios 1:}: 3; or 18:3: 1. This converts into percentages within 
each of the three age groups of 


18 3 1 
18 ¥100=81.8%, x 100=13.6%, —— = 4.6% 
22 x 100 = 81.8%, 22 x 100 = 13.6%, 22 x 100 = 4.6% 


EXERCISE 1.45 A population model in the form (1.92) has Leslie matrix 


0 13 12 
=i 
L=lt 0 0 
e 
0 i 0 
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Determine the eigenvalues of L using (1.95), and hence show that the ultimate 
tendency of the population is to double in size each time period. Show also that the 
distribution in the three age groups will approximate to 86.5%, 10.8%, 2.7% after a 
sufficiently long time has elapsed, 


EXERCISE 1.46 A certain species of insect obeys the following rules for breeding and 
survival of females: 


(i) % survive their first birthday and live into a second year; 

(ii) ; of these survive their second birthday and live into their third year; 

(iii) by the end of the third year all the original insects are dead; 

(iv) no insects are born until a female survives into its second year, when an 
average of seven new insects are produced, and this average drops to six in 
the third year of a female’s life. 


Obtain the model in the form (1.92), and deduce that the insects are doomed to 
extinction. 


It’s interesting to end our discussion of population models by giving an explicit 


expression for A‘ when the n x n matrix A has distinct eigenvalues A,, A, ..., Ay. 
It can be shown that 
A‘=AKZ, + AKZ, 40: +A'Zy, k=1,2,3,... (1.106) 
where each Z; is a constant n x n matrix. For example, when n=2 then 
A-A,I _A-Al 


She or eh 


and when n=3 


_(A-AD(A-AD 4 AHADAR-AD) =A DA-Al) 
GAA Aa) Aas)" =A) - Aa) 
(1.107) 


You might be able to spot the pattern for a general value of n: Z, is a product of 
matrices 


(A-A,I)(A-A,1)...(A-A,1) 

excluding the factor (A — A,/), divided by the product 
A; am A, OA; = Ay). (A; An) 

excluding the factor (A; — A;). 


@ EXAMPLE 1.28 


Return again to the 3x3 Leslie matrix L in (1.99) which was found to have 
eigenvalues A, = 3, A)=-1, 4;=-3. We looked at the behaviour of L* for large 
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values of kin Example 1.27, but we can now obtain an explicit expression for L* 
for any value of the positive integer k. Using (1.107) with A equal to the matrix 
Lin (1.99) and the stated values for the eigenvalues we have 

(L+ (L431) 


pees 
(3413+) 


1 az 376 
a1 
=3/4 1 O17 2 9 
1 1 
| OMe 
9 
7 28 
=i3 8 3 (1.108) 
tees 
8 4 2 
EXERCISE 1.47 Verify that 
1 1 
} = 6 STs 
—4) 1 3 4) 1 fy 
Basa ae Basis a. ae 
fetes igen eae 
8 2 4 8 4 2 


From (1.105) we can now write 

L'= 3)'Z, + (-1)'Z, + (-4)'Z, (1.109) 
The solution of the population equation is 

X(k) = L'x(0) 


so that for a given initial state X(0) we can determine X(k). Furthermore, for large 
values of k (1.109) shows that 


X(k) > (3)'Z,X(0) (1.110) 


which is an explicit expression, unlike (1.105) which contains an unknown constant 
( 


EXERCISE 1.48 If 


200 
160 


80 


X(0) = 


compute the products Z,X(0), Z,X(0), Z,X(0) using the expression (1.108) and 
Exercise 1.47. Hence determine X(10) and the expression in (1.110). Compare the 
latter with (1.105). 
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EXERCISE 1.49 Obtain an expression for A'” using (1.105) when A is the matrix in 


Exercise 1.39, 


EXERCISE 1.50 Use (1.105) to determine A* when 


11 
A=|2 2 
1 0 
Hence show that the solution of 


X(k+1)=AX(K), X(0) = H 


approaches as k becomes large. 


wi wl 


EXERCISE 1.51 A certain population of insects is described by the standard model (1.92) with 


Dr 203 
L,=|} 0 0 
2 
One 10 
In a different environment it is found that this changes to 
03 4 
L,=|} 0 0 
3 
Oe 


Show that L, and L, have the same strictly dominant eigenvalue. Use your calculator to 
estimate this eigenvalue, and hence deduce that in either environment the population 
increases annually by about 32.5% after a sufficiently long time has elapsed. 

In each case use (1.98) and (1.103) to obtain the ratios of the numbers in the 
three age groups after a long time has elapsed. 


PROBLEMS 


11 


1.2 


A sum of £200 is placed in a savings account which pays interest of r% compounded 
annually. If at the end of 5 years the account contains £270.23, determine r. 


Consider again the aquarium model discussed in Examples 1.3, 1.6 and Exercise 1.14, 
Suppose that at the end of each week p units of water are removed from the 
aquarium, and p+ 1 units of fresh water are added so as to bring the water level back 
to normal (recall that 1 unit evaporates each week). Obtain the difference equation 
corresponding to (1.6). Deduce that after a long period of time the salt concentration 
approaches (p+ 1)/p times the original concentration (it is assumed as before that n, 
the total number of units of water in the aquarium, is large). Notice that this agrees 
with the case p=2 in Example 1.6. 
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Verify by direct substitution that 
X= axy + at + 2? at 2437 ah 34... 4 
is the general solution of 
HS eR kale ey ohaee 


What does the solution become when a = 1? 


The following model has been suggested to describe pollution of Lakes Erie and 
Ontario in North America: 


(i) all the outflow from Lake Erie flows into Lake Ontario; 
(ii) each year 38% of the water in Lake Erie and 13% of the water in Lake 
Ontario is replaced. 


Let x, and y, denote the total amounts of pollution present in Lakes Erie and Ontario 
respectively at the start of year k. 

For the period under consideration (i.e. from k=0) new laws are introduced to 
protect the environment and these ensure that there is no further pollution of the lakes. 
Obtain the difference equations expressing x,,, and y,,, in terms of x, and y,. Solve 
these equations and show that the amount of pollution in Lake Erie is reduced to 10% 
of its original value after 5 years. Assuming that x, =3yo, show that to achieve the 
same reduction for Lake Ontario takes approximately 29 years. 


Denote the Fibonacci numbers defined in Section 1.1 by fy=1, f, =1, f,=2, and so 
on. Prove the following identities by the method of induction (see the appendix to this 
chapter): 


(a) fot fit fite tine Safar 220 
(0b) fit fat fst + fan =fan-1, 071 
(C) fines =Sinfans2— 1,220. 


Fibonacci numbers arise in connection with the following model of the behaviour of 
atoms of hydrogen gas. 

A single electron belonging to an atom is initially in the ground level of energy 
(state 0) and is assumed to gain and lose energy alternately in succession. The rules 
are: 

(i) when the gas gains radiant energy, all the electrons in state 1 rise to state 2; 

half those in state 0 rise to state 1 and half to state 2; 


(ii) when the gas loses energy, all the electrons in state 1 fall to state 0; half 
those in state 2 fall to state 1 and half to state 0. 


The histories of the states occupied by an electron are then as follows: 


After an initial energy gain there are two possible histories, either 01 or 02. 

There is then an energy loss, giving three histories: 010, 021, 020. 

After the next energy gain there are five possible histories: 0101, 0102, 0212, 
0201, 0202. 
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1.7 


18 


The sequence of events can be represented diagrammatically as in Figure 1.11. 


Energy gain Loss Gain 


Figure 1.11 


Extend this diagram for the next sequence of energy loss and gain, and verify that 
the numbers of different histories of the electron are the next two Fibonacci numbers. 


The binomial expansion formula is 


Nee n Pa n \pn-1, 2" 
asarat+(tla+(s)a+ +(.,}2 +a 


where n is a positive integer and 


("\ ee Mi (n= 
r}) rn-r)! n 
(")=0. r>n, rl=1x2x3x+--x(r-1) xr 


Apply this to the expression (1.8) for the kth Fibonacci number x,, Hence show that 


1 
ae (a ia Oe Hsse(BEt ot RS tt 
Verify this is correct for k= 1, 2,3, 4, 5. 


The Lucas numbers 
1,3, 4,7, 11, 18, 29, ... 


also turn up in applications related to the natural world. They are defined by the same 
difference equation as for Fibonacci numbers, namely 


Ly = Lis tly, k=0,1,2,... 


but with different initial values Ly =1, L, =3. Use the solution obtained in (1.30) to 
show that 


k+1 _ Jag \k+1 
1a=(*424} +( 4) eae 


Check this result for k=4. 
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Using the result in the previous problem, together with the expression in (1.8) for 
fi, prove that the Fibonacci numbers f, and the Lucas numbers L, satisfy the 
relationship 


Sree = file, k=1,2,3,... 
Check this result for k= 4. 


Verify the identity 
140407 e+ 


By differentiating both sides with respect to @, obtain the identity given in Exercise 
115. 


Wild plants propagate by self-seeding. Field observations of a certain species suggest 
the following rules: 


(i) the plants flower and produce seeds either 1 year or 2 years after 
germination; 

(ii) plants die after flowering; 

(iii) 20% of the seeds produce plants that flower after 1 year, whereas 50% of 
the seeds produce plants that flower after 2 years, and the remainder (30%) 
of the seeds fail to produce plants that survive to produce more seeds; 

(iv) on average each plant flowering after 1 year produces 350 seeds, compared 
with a figure 750 for plants flowering after 2 years. 


Let s, denote the number of seeds produced by flowers in year k. Show that 
Spo = 708,, +3758,, k=0,1,2,... 
If initially there are 10 seeds (i.e. s) = 10) deduce that s, = 700 and obtain a general 
expression for s;. 
Solve the equation 
Bb OC cies Re Gs a 
subject to x) = 0 by substituting 
x, = ak? + bk? + ck 


Equate coefficients of powers of k to obtain three equations for the three constants a, 
b, c and hence show that a=}, b=}, c=t. 
Notice that 


X= 14 aed oe 
Xs =X, +3? = 1742743? ... 
so it follows from the expression for x, that 
1242743743 + Pati +ik 
=1k(k+1)(2k+1) 
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1.13 


1.14 


1.15 


Use this method to solve 
Maye t+ Ok—1),, k=1, 2,3)... 


subject to x) = 0. Hence prove that 


2 
434 SP 4 (k= MED 


Modify the method used in the preceding problem to solve 
PM Oe el 8 a 


subject to x) =0. Hence obtain an expression for the sum of the cubes of the integers 
from 1 to k, 


A game with two players (A and B) involves tossing coins which have an equal 
probability of coming up heads or tails. The rules are: 


(i) if acoin comes up heads, A gives B one coin 
(ii) if acoin comes up tails, B gives A one coin. 


The winner is the player who first ends up with all the coins. 
Suppose that initially A has k coins and B has N—k coins. Let p, be the 
probability that A wins, so that py) =0 and p, = 1. Show that 


Pest = Perot Pes &=0,1,2,... 


Solve this equation to obtain p, = k/N. 


A coin has a probability p of coming up heads and q=1—p of coming up tails when 
tossed. A gambler wins £1 if the coin shows heads, and loses £1 if it shows tails. The 
gambler begins with £a and aims to quit when he or she has £b (with b> a). If the 
gambler loses all his or her money before achieving this goal then the gambler is said 
to be ruined. 

Let p, denote the probability of eventual ruin when the gambler has £k. Notice 
that p)=1 (the gambler is already ruined) and p,=0 (the gambler has won). Show 
that 


PP Pe+(1- Pp) =0, k=1,2,3,... 


Solve this difference equation for each of the cases p + q and p= q. Hence show that 
the probability of eventual ruin with the original stake of £a is 


Pre; (a"-0), a=241 
i=2 Pp 
=1-%, p=4q 


Deduce that when the coin is unbiased (i.e. p= q) the gambler has a 50% chance of 
being ruined when trying to double the original stake, and a 75% chance when trying 
to quadruple this stake. 
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Owing to restricted water supplies a farmer can irrigate his or her fields only from 
9 p.m, to 9 a.m. During this period the farmer adds a quantity c of water to the 
topsoil. However, during the period from 9 a.m, to 9 p.m. half the total water in the 
topsoil is lost through evaporation and absorption. 

Let x, denote the amount of water in the topsoil at the end of the kth 12 hour 
period, starting with xp at 9 p.m. on the first day. Show that 

Xpoa=2m +4c13-(-1)'], k=0,1,2,... 


by considering separately the cases k odd and k even. 
Solve this equation, and hence deduce that when this irrigation programme has 
been followed for a long time, x, essentially oscillates between c and 2c. 


Use the z-transform to obtain the solution of 
Nu =ax,+c', k=0,1,2,... 


for each of the cases c # a, cC=a. 


Retum to the simple model of a national economy described in Exercise 1.10. If a=}, 
B=1, government spending is at a constant level G, = G (all k) and J, =2G, I, =3G, 
show by using the z-transform that 


k= 1 + (45) sin(! i), k=0, 1,2). 


You will need the fact that 


1 \es: Ps Zz 
Al] = «| ie 227-2241 


The assumptions of the model therefore produce a national income which oscillates 
because of the sine term, and as k—>e the national income approaches twice 
government expenditure. 


(a) The hyperbolic functions are defined by 
cosh x=(e*+e~*)/2, sinh x=(e*-e~*)/2 


Show that cosh? x-sinh? x=1. If w is the positive solution of the equation 
cosh x=3/2, show that sinh w= 5/2. 
Given the transforms 


sinh wk) = —— hw 
z*—2zcoshw+1 
Z(coah wee 
2?-2zcoshw+1 
show that 
TRO Sr ip Se lata) 


2-3z+1 
where c is a constant. 
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Figure 1.12 


(b) A so-called ladder network of resistors R is shown in Figure 1.12. The applied 
voltage is V and i, is the current in the kth loop. It can be shown by what is 
known as Kirchhoff’s law that 


ina 3k +i =0, k=0,1,2,... 
with i, =2i) - V/R. Obtain an expression for the z-transform of i,. Use the result 
in part (a) to determine i, in terms of ij, w, V and R. 
1.20 The definition (1.41) of the z-transform of a sequence x,, k=0, 1, 2, ..., states that 
Seas 
Ze) =x) + 2+ 24 Bs 
z f 2 
Differentiate both sides with respect to z and hence deduce that 
d 
Z(kx,) = -—z — Z(x, 
(kx) E (x) 


Use this result to show that 


2ket) = — 
(2-0) 


given that 
Ack) = 4 
zc 


where c is a constant. 


1.21 A simple economic model of ‘supply and demand’ assumes that if the supply of a 
commodity in year k is s, then the price is p, = a—bs,, where a and b are positive 
constants (this implies that the price declines if the supply increases). It is also 
assumed that the supply in year k+ 1 is proportional to the price in the previous year, 
that is s,,, = cp, Where c is a positive constant. 

Obtain the difference equation satisfied by p, and find its general solution. Deduce 
that if be <1 then the price stabilizes after a sufficiently long time has elapsed. 


1.22 Consider the following simple model of trade between two countries only, where the 
exports of one are the imports of the other. The assumptions are: 


(i) national income (x;) = consumption outlays (c;) + net investment (v;) 
+ exports (e;)— imports (7m, ) 
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(ii) outlays for domestic consumption (d;) = c; — m, 

(iii) domestic consumption (d,)=,;x;(k-1) (multiple of national income in 
previous time period) 

(iv) imports are proportional to national income in previous year, that is 


my (k) = ay,x,(k=1), 1m, (k) = ay2%,(k- 1) 


where throughout i=1,2 denotes the two countries and the a, are 
constants. 
Show that 


4) |) a1 ap |ak- 1] [® % 
BA ie colar Bale k=0)1,2,.. 


Consider a simplified cattle ranching model where the numbers of females in year k 
are 


x, (k) =number of 1-year-olds (‘young’) 
X,(k) = number of 2-year-olds (‘mature’) 
x; (k) =number of 3-year-olds and older (‘old’) 


In the absence of slaughtering the assumptions concerning breeding and mortality are 
as follows: 


(i) young females do not breed; 

(ii) amature female produces on average 0.8 young cattle per year; 
(iii) an old female produces on average 0.4 young cattle per year; 
(iv) only old cattle die, at the rate of 30% per year. 


Show that 
xk+1)} [0 08 0.4] m4) 
x(k+1)}=|1 0 0 | x(k], &=0,1,2,... 
x(k+1)] LO 1 07} xc) 


Notice that in this case the matrix does not have the Leslie form (1.89), since we have 
not assumed that the maximum lifespan of cattle is 3 years. 


In the car rental model described in Example 1.23, after several years have passed it is 
found that 80% of cars rented at B are dropped off at L, whereas only 25% of those 
rented at L are dropped off at B. Otherwise, as before, cars are returned to the 
originating office. Find the new equilibrium situation in this case, assuming that the 
total company fleet has grown to 210 cars. 


Two TV channels show competing newscasts in the same time slot every evening. 
Audience research amongst those who always watch the news shows that if a viewer 
watches Channel X on one evening there is a 50% chance of switching to Channel Y 
the following evening. However, the programme on Channel Y is more enjoyable, so 
there is only a 40% chance of returning to Channel X. Let x,, y, be the probabilities 
that a viewer is watching Channel X or Y respectively on the kth day. Show that 


taba Tess 
Xpe1 = 2% + 5Ver Yer =2%e+5Me &=0,1,2,... 
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Write these equations in matrix form and obtain the general solution. Deduce that 
after a long enough time has elapsed the probabilities that a viewer watches the news 
on Channel X or Channel Y are § and § respectively. 


1,26 The following model has been proposed for a trout fish farm. In the four stages in the 
life of a trout, define at year k: 


x, (k) = number of eggs 
x,(k)=number of fry 
x(k) =number of young 
x,(k) =number of adults 


Define also 


u, (k) = number of eggs added artificially 
u,(k) = number of young removed for stocking streams 


From observations over several years it is found that: 


@) 


number of 
eggsat = 
year k 
number of number of number of number of 
eggs added | + | eggs laid by } — | eggs eaten by | — | eggs eaten by 
in year k adults fry young 
= x4(k) e< x2 (k) e< x3 (k) 


(ii) a constant proportion of eggs-in year k survive to become fry in the 
following year; 
(iii) a constant proportion of fry in year k survive to become young in the 
following year; 
(iv) 
number of rma « { number of young + number of adults | in previous 
in year k —number of young removed year 


Show that 


xXy(k + 1) a, -@, 4, || x(k) a, O 
xy(k+1)|=| a, 0 O]fx,]+]0 0 
xq(k+ 1) 0 as asi\x(| | 0 4s 


where the a; are positive constants, 


1.27 For a certain population of wild animals, censuses taken twice yearly at the end of 
April and at the end of October reveal that no animal lives for more than 18 months. 
This period is divided into three time intervals each of length 6 months. Let x, (k), 
x,(k) and x,(k) be the numbers of animals in the kth 6 monthly period in each of the 
age groups 0-6 months (young), 6-12 months (mature) and 12-18 months (old) at 
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the time of the April census. Let y,(k), y.(k), ¥3(k) be the corresponding numbers at 
the time of the October census. Observations show that 


nk} ]0 4 6|) 4k) 
yolk) |=|2 0 0]] x(k) 
ys] [9 F Of] 
Yk) x(k) 
and also 
O23 
X(k+1)=|4 0 0|K®) 
reed 
9 J 


(a) Obtain the matrix difference equation 
Y¥(k+1)=AY(k), k=0,1,2,... 
and determine the eigenvalues of A. Similarly obtain the expression 
X(k+ 1) = BX(k) 
and verify that B has the same eigenvalues as A. 


(b) Consider the equilibrium situation which is attained after a long time has elapsed. 
Show that in this case the population increases annually by a factor g 

(c) Because of this population growth the animals have become a pest, so it is 
decided to operate an extermination programme. In this scheme of young 
animals, } of mature animals and 33 of old animals will be killed each October. 
Show that after a sufficiently long time the population will decrease by 50% 
every year (and so will eventually become extinct). 


A simplified model for the wild buffalo population in the American west in 1830 has 
been suggested as follows. Let F,, M, be the numbers of adult female and male 
buffalo respectively at the start of year k, where k=0 corresponds to 1830. On the 
basis of observations the following rules apply: 


(i) 5% of adults die each year; 

(ii) _ the animals reach maturity at age 2 years; 

(iii) the number of new adult females alive at the beginning of year k+2, 
taking into account infant mortality, is 12% of F;,; 

(iv) more male calves are born than female, the figure corresponding to (iii) 
being 14% of F,. 


Show that 
F,42=0.95F,,, + 0.12F, 
M,,.=0.95M,,,+0.14F,, k=0,1,2,... 


Use the z-transform to obtain an expression for F, in terms of F, and F,. Hence 
deduce that when k is sufficiently large the population increases by 6.3% per year, 
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1.29 Consider the Leslie matrix L in (1.89). Show that the average number of daughters 
born to a single female during her expected lifetime is 


d=a,+ a,b, + a,b,b, + ++» + a,b by ... by 


Deduce that L has an eigenvalue equal to | if and only if d=1. 


1.30 A model of a population of mice uses the following assumptions: 


(i) mice do not mate until they are 1 month old (‘mature’); 
(ii) each pair of mature mice present at the end of 1 month produces two new 
pairs by the end of the next month; 
(iii) no mice die. 
Let x, denote the number of pairs of mice at the end of the kth month. Show that 
Xp42 — X41 —2%,=0, &k=0,1,2,... 


Obtain the solution of this equation subject to the conditions x) =2, x, =4. How long 
will it take for the mouse population to exceed 500 pairs? 


1.31 _ A particle is moving along the x-axis in the direction of increasing x, Its x-coordinate 
after k seconds have elapsed since starting off is x,. The distance the particle travels 
from time k to time k+ 1 is equal to twice the distance it travelled from time k-1 to 
time k. Show that 

Xy42— 3X41 +2%,=0, k=0,1,2,... 
and find the expression for x, subject to x) =1, x, =5. 


1.32 An investor has £1000 and wishes to turn this into £1300 as quickly as possible by 
investing it in one of the following accounts: 


(i) annual interest of 8%; 
(ii) APR of 1 compounded quarterly; 
(iii) APR of 6} compounded monthly. 


Which option should be selected? 


APPENDIX: PROOF BY INDUCTION 


The method of induction is an important way of proving results which you have 
guessed may be correct. For example, let S, be the sum of the first n positive 
integers. By direct addition 


S,=1, S,=1+2=3, S;=1+2+3=6, S,=1+2+3+4=10 


and similarly $;=15, S,=21. You may be able to spot a pattern: what is happening 
is that 
Se 1x2, oe VES) 


3x4 4x5 
: Pas ra OS ae 
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and sure enough this also works for S; and S,: 
5x6 6x7 
ejdeal Wyse 
2 ae 


It therefore /ooks as though in general 


Ss 


a n(n+1) 
5,= Me) (Al) 


but it is important to realize that this remains a guess at this point. We can’t be sure 
that the formula (A1) is correct even if we verify that it works for n going from 1 to 
100, or even from 1 to 1000. 

It is a fallacy to assume that if a formula works for several (or even very many) 
values of n then it must be true for all values of n. A simple illustration of this 
fallacy is provided by the formula 

a, = 34 (n* — 6n3 + 23n? -— 18n +24) (A2) 
It is easy to check with a calculator that 

a, = 74 (1-6+23-18+24)=1 

ay = (16-6 X8+23x4-18x2+24)=2 

a,=4=27, a,=8=2°, a,=16=2* 
and it seems ‘obvious’ that for any positive integer n 

a,=2"" (A3) 
However, you can also check that substituting n=6 into (A2) produces a,=31, 
which is not 2° (= 32). Thus, although (A3) is correct for n=1, 2, 3, 4, 5, it is not 
true for all values of n. 


Let’s go back to the formula (A1). We certainly know that it is correct for n= 1. 
What we need to do is to prove: 


If the formula is true for n equal to any positive integer N, then it is also true 
forn=N+1, (*) 


This is quite easy: if we assume 
N(N +1) 


Sy = A4 
N 2 (A4) 
then clearly from the definition of S, we have 
Sya1 =Sy +(N + 1) (AS) 
NON Iya. 9 4) 
2 
— (N+1)(N +2) (A6) 
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This result (A6) is exactly the same as what we get in (A4) if we replace N by N+1. 
We have therefore established the condition (*) in this example: if (A1) holds for 
n=N then it is also true for n=N +1. However, we know that (A1) is correct for 
n=1, so it must also be true for n=2; and since it is true for n=2, the condition (*) 
tells us it must be true for n=3. You should now realize that we can proceed in this 
way for ever: the correctness of (Al) for n=3 implies it is true for n=4, and 
similarly for n=5, and so on. Hence we can conclude that (A1) is indeed true for 
any value of the positive integer n. 

In general, the method of induction to prove that a guessed formula S,, is correct 
consists of showing that it is correct for some particular value n= a (in the example 
above a=1) and then establishing that (*) holds, from which it follows that S,, holds 
for all integers n? a. 

If you are still a bit mystified, let’s see what would happen if we guessed the 
wrong formula for the sum S,, of the first n positive integers. Suppose we thought that 


S,=n-n+1 (A7) 
This certainly works for n= 1 and n=2, since 
S,=V-14+1=1, S,=2?-24+1=3 
If we assume this is true for n = N then we have 
Sy=N?-N+1 (A8) 
and hence 
Syo1 = Sy + (N+ 1) 
=N?-N+1+N+4+1 
=N?*+2 (A9) 
However, if we replace N by N + 1 in (A8) we get 
Sy=(N+1)?-(N+1)+1 
=N?+2N+1-N 
=N*+N+4+1 


which is not the same as (A9). This shows that the statement (*) does not hold, so if 
(A7) holds for n=2 it does not follow that it is also true for n=3 — indeed (A7) 
gives S, =3°-3+1=7 instead of the correct value S, =6; the fact that (A7) works 
for both n=1 and n=2 is just a fluke. 

As another example, consider the formula 


S,=27=1 (A10) 
which gives 
S,=2?-1=3, S,=2'-1=15, S,=2°-1=63 


and if you compute a few more values of S,, you will find that each is divisible by 3. 
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To prove by induction that this is true for all values of the positive integer n, we 
need to show that (*) holds. That is, if 


Sueetlon 


is divisible by 3, we must show that Sy,, is also divisible by 3. Because Sy is 
divisible by 3, we can write it in the form 3k where k is a positive integer, so that 


2°%_1=3k 
or 27% =3k+ 1. Next, setting n= N+ 1 in (A10) gives 


Syay = 2209 1 
= 22N+2_ 4 


=1920=11 
=4.2°N—] 
=4(3k+1)-1=12k+3 


which is again divisible by 3. We have therefore established that (*) does indeed 
hold, and since S, is divisible by 3 it follows that S,, is divisible by 3 for all values 
of the positive integer n. 

You will need the technique described in this appendix to solve Exercise 1.8 and 
Problem 1.5 involving some properties of Fibonacci numbers. 


EXERCISE Al Prove by the method of induction that each of the following results holds 
for all values of the positive integer n: 


(a) 17+27+37+4?+--. n?=n(n+1)(2n+ 1)/6 (see also Problem 1.12). 
(b) 23"—1 is divisible by 7. 


EXERCISE A2 Let S,, be the sum of the odd integers from 1 to 2n— 1. By evaluating S,, 
S,, S;, S, guess a formula for S,. Use the method of induction to prove that your 
result is true for all values of n. 
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2.1 INTRODUCTION AND EXAMPLES 


Almost every product we buy in a supermarket has a barcode on it, like that shown in 
Figure 2.1. 

The bars are designed to be read by a laser scanning system, producing the string 
of numbers which identifies the item (in this case, a tube of hair cream). Turn over 
any book, and in the bottom right-hand comer of the back cover you will find 
another barcode, like that in Figure 2.2. 

This number is called the International Standard Book Number (ISBN), and 
every published book has its own unique number. Barcodes are very widely used in 
commerce and industry, and play an important role in the processing of management 
information. 

The second part of this chapter’s title refers to one of the most impressive 
technological feats of the twentieth century: the transmission of pictures to Earth by 
spacecraft visiting other planets in the solar system. The first, rather blurred, pictures 
of Mars were received in 1965, but the Mariner 9 spacecraft in 1971 sent television 
signals from Mars to produce pictures of excellent quality back on Earth. Yet the 
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Figure 2.1 


ISBN 0-19-859665-0 


9 °780198 || 


Figure 2.2 


power of the transmitter was so small (only 20 watts) that what happened was like 
being able to see a car stoplight on Mars, which is 135 million km away! Compare 
this with a commercial television transmitter which needs about 35 000 watts for a 
range of around 80 km. In 1976 colour photographs of Mars were obtained, and in 
the 1980s spectacular pictures of Saturn, Uranus and Neptune were produced. The 
basic principle used is to break down the pictures into a large number of very small 
elements, and transmit these as numerical data. The way in which such data are 
processed at the receiving end so as to get rid of distortions and interference forms 
the topic of this chapter. 

Let’s look at a simple problem in which representing information in numerical 
form can be useful. 


M@ EXAMPLE 2.1 


Suppose we have a list of eight names in alphabetical order: 
Ann, Bob, Carol, Dave, Ellen, Fred, Gill, Harry 


| am thinking about one name on this list, and you have to determine which one 
it is by asking me questions, to which | will answer only ‘yes’ or ‘no’. What is 
the smallest number of questions you need to ask in order to get the right 
name? 

The answer is ‘three’: first ask me whether the name is in the first half of 
the list — this narrows it down to four names. Then ask me whether the name is 
in the first half of this shorter list - this narrows it down to two names. Finally, 
ask me whether the name is the first in this last pair. For example, if | am 
thinking of ‘Carol’ my first answer is ‘yes’ — the name is in the first four (Ann to 
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Dave); my second reply is ‘no’, revealing that the name is either the third or 
fourth (Carol or Dave); finally | say ‘yes’, so that you can identify the name as 
the third on the original list. 

Suppose | now change the rules of the game. You ask questions as before, 
but instead of answering one at a time | only give you my answers after you 
have asked all your questions. Do you need to ask more than three questions? 
In fact, perhaps surprisingly, you still only need three questions to identify the 
correct name. The easiest way to see this is to first assign a number to each of 
the names in the list, going from 0 for Ann, 1 for Bob, up to 7 for Harry. The 
reason for going from 0 to 7 rather than from 1 to 8 is that we can represent 
each of the numbers in binary form. You are familiar with the way decimal 
numbers work: the number 751, for example, means 7 hundreds, 5 tens, and 1 
unit, or 7x 10?+5x 10'+1x 10° The position of the digit tells you what power 
of the base 10 it corresponds to. In exactly the same way, binary numbers use 
base 2, so there are only two digits 0 and 1, called bits (short for binary digits). 
For example, the binary number 10101 means 


1x 2°+0x29+1x27+0x2'+1x2° 


which is 16 + 4+1=21 in decimal form. Most scientific pocket calculators have a 
button which converts binary numbers to decimal numbers, and vice versa. 

Going back to our original problem, the binary representations of the 
decimal numbers 0 to 7 are as follows: 


decimal 0 1 2 3 4 5 6 i 
binary 000 001 010 011 100 101 110 111 
name Ann Bob Carol Dave Ellen Fred Gill Harry 


Notice that we only need 3 bits for each of the numbers. The binary form of 8 is 
1000, so if we had numbered the names from 1 to 8 we would have had to use 
four digits for Harry. 

To play the second version of the game, all you have to do is ask me if each 
digit is 0 reading from left to right. Remember my choice was Carol, third on 
the list, and having the code number 010, as shown in the table above. So my 
answer, after you have asked the three questions, would be ‘yes, no, yes’ — in 
fact, the same answers as in the original game. 

We say that the information about the list of names has been encoded into 
binary form. The numbers 000, 001, 010 and so on are called codewords, and 
the set of all these is a code. 


EXERCISE 2.1 Analyze the game described in Example 2.1, in the cases when there are 16 


names, or when there are 32 names, 


Human counting generally uses the decimal system with base 10, although base 


12 occurs in the old Imperial measuring system of feet and inches, and in the old 
British coinage of pennies and shillings. The base 60 goes back to Babylonian times, 
and is still firmly with us for time and angle measurement (seconds, minutes, hours). 
However, computers are built from electronic components which can be in either one 
state or another (e.g. an on-off switch). It is therefore natural to use binary numbers 
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to represent the two states. The decimal system undoubtedly has its origins in the fact 
that humans have five digits (four fingers and a thumb) on each of their two hands; 
so perhaps we can imagine that a computer has only one finger on each hand! 

The third component in the title of the chapter is the now-familiar compact disc 
(CD) for reproducing sound. The old-fashioned technology of the long-playing 
record relies on the sound being converted into a long wiggly groove which goes 
round and round the record. The wiggles are traced by a stylus, whose movements 
are turned back into music. Although ingenious, this technique unfortunately allows 
too much scope for distortion and loss of quality in the reproduction. In the 
technology of the compact disc, which was introduced in 1982, the musical sounds 
are decomposed into tiny individual parts which are converted into digital form: in 
just 1 second a CD player processes 1 460 000 bits of audio information. The bits are 
read off the disc by a laser beam. However, even with the most careful manufactur- 
ing and handling procedures, faults still occur on CDs. The reason why, despite such 
flaws, the music sounds so authentic and free from ‘clicks’ and other unwanted 
background noises is that the CD contains about twice as many more bits of non- 
audio information. This extra information is used to process the music on its way to 
your ears, so that it ends up sounding virtually perfect. In particular, some of these 
extra bits are used to correct errors, which is what this chapter is all about. The actual 
error-correcting scheme for CDs was invented at Philips Research Laboratories in 
Eindhoven in the late 1970s, and emphasizes that in this important technological 
development the electronic engineers could not have succeeded without the 
contribution of the mathematicians! 

Incidentally, we shall not be discussing codes which are used by spies and others 
to maintain secrecy — this is another interesting branch of recent applied mathemat- 
ics called cryptography. 

In general we wish to send information which has been converted into numerical 
form along some communication channel as reliably as possible. This might be a 
telephone line, a satellite communication link or a magnetic disk used with a 
computer. Suppose we are using a binary representation of the data (i.e. a string of 
zeros and ones), so that what is happening can be as represented in Figure 2.3. 

The message has some extra check bits appended onto it by the encoder device 
(in some cunning way) so as to produce a codeword. During transmission, so-called 
‘noise’ (i.e. external interference) causes errors to occur. The sources of the noise 
could be, for example, electrical faults or disturbances, lightning, radiation or 
human error. The string of zeros and ones which is received is no longer the same as 
the original codeword — it might be a different codeword, or it might not be a 
codeword at all, in which case it is called simply a word. The fundamental idea of an 
error-correcting code is to utilize the extra bits added by the encoder in such a way 
that after decoding the received word the original message can be recovered. What 
we shall be looking at in this chapter is how to construct some of these cunning 
schemes for adding the check bits. 

We are already familiar with a similar idea in written language. There is often 
sufficient redundancy already present in the structure of the language itself to enable 
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1101 1101...10 
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Figure 2.3 


us to guess correctly what is meant even if there are spelling errors, or perhaps 
vowels omitted. For example, we can still understand ‘Ystrdy ws cldy’; or we can 
still make sense of “Tomorrow the weather will be five’, since we can be pretty sure 
that there has been a typing error and ‘five’ should have been ‘fine’. However, if a 
clothing warehouse received a telex message ‘Supply 1000 shurts’, there is no way 
of telling whether ‘shirts’ or ‘shorts’ are required. Even worse, if a bank clerk types 
in an incorrect account number during a financial transaction, then you could find 
money has been withdrawn from your account instead of somebody else’s 
(Murphy’s law suggests that it is unlikely that you would actually benefit from such 
a mistake! ). 


@ EXAMPLE 2.2 


Suppose we want to send four commands to a robot arm involved in some 
manufacturing process: UP, DOWN, LEFT, RIGHT. We could use the following 


codewords: 
UP DOWN LEFT RIGHT 
00 01 10 11 


However, if even a single error occurs during transmission then there is no way 
of detecting that an error has occurred, as the received message is perfectly 
legitimate. For example, if 01 is sent, but is corrupted so that 11 is received, 
then the arm moves to the right instead of downwards. 


When someone speaks to us and we don’t quite catch what they say, then we ask 
them to repeat it. So a natural way to try and correct errors is simply to repeat each 
transmitted word. However, it’s not enough just to repeat the message once. To see 
this, suppose we are still trying to tell the robot arm in Example 2.2 to move DOWN, 
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and we transmit the instruction twice, namely 0101. If again a single error occurs 
(say in the first bit), and the arm receives 1101, it can certainly detect that there has 
been an error, since 11 is different from 01. However, there is still no way of 
correcting the error — that is, no way of deciding which bit is wrong — because as far 
as the receiver is concerned the original message could have been 1111 with a 
transmission error in the third bit, Now let’s try transmitting the message three times: 
010101. If a single error now occurs in transmission, say in the third bit so 011101 is 
received, not only is this error detected but it can also be corrected on a ‘best of 
three’ count for each bit, as shown: 


011101 (2.1) 
ce 

on a majority count, 

this bit should be 0 


Although this simple repetition code can correct any single transmission error, an 
obvious disadvantage is the need to transmit three times the actual information. This 
is not only expensive, but may be physically impractical — think of the problem 
when data are to be sent back to Earth by a space probe. Notice, as well, that since 
the extra bits (the repetitions) are themselves subject to error, accuracy cannot be 
guaranteed. However, because of the general reliability of electronic equipment, we 
are justified in making the assumption throughout this chapter that the probability of 
an error in a single bit is small, so that one transmission error is more likely than two 
(or more) errors. To see what this implies, let’s look again at the received message 
011101 in (2.1). It is certainly conceivable from the receiver’s point of view that the 
intended message was 11 (RIGHT), that we sent 111111 and two transmission errors 
occurred in the first and fifth bits, thereby producing (2.1). However, this is less 
likely to have happened than our original deduction that 010101 was sent with a 
single error occurring in the third bit. All we aim to do, then, is to make the 
probability of accuracy as high as possible. 

It should be stressed that at this stage we are supposing that decoding of a 
received word is done on the basis of direct comparison. That is, a complete list of 
codewords is available at the receiving end, and the received word is simply 
compared with these. We choose as the transmitted codeword the one which is 
obtained from the received word with the smallest number of errors — this is called 
nearest-neighbour (NN) decoding. 


EXERCISE 2.2 A message to be transmitted consists of a single bit, 0 or 1 (YES or NO). 
The repetition code used is 


Message 0 1 
Codeword 000 lll 


For example, if 001 is received, then assuming at most a single transmission error, we 
deduce that 000 was transmitted, so the message is decoded as 0. By considering all 
the remaining seven possible received words, show that this code corrects all single 
errors. 
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A very simple but useful code is obtained by appending just one extra bit to each 
information message so that the overall number of ones is even — this produces the 
even parity code (the odd parity code similarly has an odd number of ones in each 
codeword). If any single error occurs in transmission then the number of ones in the 
received word will be odd, so that we detect the error and can then ask for the 
message to be retransmitted. 


@ EXAMPLE 2.3 


Consider the code in Example 2.2, and put either 0 or 1 onto the end of each 
word so as to make each new codeword contain an even number of ones. For 
example, 01 becomes 011, and the complete table of new codewords is 


UP DOWN LEFT RIGHT 
000 011 101 110 


If, say, 011 is sent and an error occurs in the second bit so that 001 is received, 
the error is immediately detected since 001 contains an odd number of ones. 


This extra bit is called the parity-check bit, or simply the check bit, and the 
original bits are called the information bits. We shall only deal with so-called block 
codes where each codeword has the same length, which is the total number of 
symbols; this is also defined to be the length of the code itself. In Example 2.3 the 
codewords therefore have length 3. Codes in which the codewords have variable 
lengths are also used, the most famous being the Morse code which was particularly 
popular in the days before the radio transmission of speech had been invented. This 
code takes advantage of the relative frequencies of letters in English, so that - stands 
for E, and -—-—- for J, for example. A snag is that it is difficult to recognize the end 
of a codeword, or the start of another. 


@ EXAMPLE 2.4 


For a code of length 7 with one even parity-check bit, suppose we wish to send 
the information message 110111. To make the overall parity even we append 
the check bit 1, and transmit the codeword 1101111. If this is the word which is 
received, since the overall parity is even we infer that no errors have occurred 
in transmission, so on dropping the check bit we correctly decode the 
information messages as 110111. However, if a single transmission error 
occurs the overall parity of the received word will be odd — for example, some 
possible received words containing a single error are 


0101111, 1100111, 1101110 


We certainly detect that an error has occurred, since each of these words has 
odd parity (they contain five ones). However, we cannot determine in which bit 
an error has occurred, so we decode the message by reporting ‘Error’. 
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EXERCISE 2.3 For a code of length 6 with one even parity-check bit, the following words 
are received: 110001, 001100, 101010, 111110. Decode them, assuming at most one 
transmission error has occurred, For the last word, give two possible transmitted 
codewords which differ from this received word in only a single bit. 


EXAMPLE 2.5 5 oloGil a(t 0 


Not all codes are binary. Indeed, the example of a bar product code shown in 
Figure 2.1 is a decimal code, under the European Article Number (EAN) system. 
Codewords have the form 


Xy Xp XqXq ve» Xo X11 M2 Xia 
where each of the symbols x, can be any of the decimal digits 0, 1, 2, 3,...,8, 9. 
In this code, the first two digits x, and x, are allocated to countries, with 50 
belonging to the United Kingdom. The next five digits (in this example 10611) 
gives the manufacturer's number, and the next five (here 18171) give the 


unique number identifying the particular product. The last digit x,, is the check 
digit, calculated so that the check sum 


Xy + Xqt Xt Xp + yt Hy + B( Xp + Ky + Xe t Xe t+ Mio + X2) + My (2.2) 
is a multiple of 10. In this example we have 
5+14+64148+74+3(0+04+14+14+14+1) + x3 = 404+ x9 
so the check digit x,, is zero. Notice that this code detects all errors in a single 


digit, because if any one of the digits x,, x, ..., Xz is incorrect, the check sum 
(2.2) cannot be a multiple of 10. 


The barcode is read by a laser scanner which works on the ratios of the widths of 
light and dark bars. You can see this in action at the checkout counters of most 
supermarkets and stores. The price of the product, however, is not part of the 
barcode. That information is held in the store’s computer, which informs the 
electronic cash register of the price of each item as its barcode is scanned. 


EXERCISE 2.4 Determine the check digit for the product having number 501015256020. 


Another common kind of error which arises when entering numbers onto a 
keyboard, or reading them aloud over the telephone, is inadvertently to transpose 
(i.e. interchange) two adjacent digits. Suppose, for example, that in the product 
barcode in Example 2.5 x, and x, are transposed, and that x, +x, (if x,;=X,, 
transposition has no effect!). The change in the check sum can be calculated in the 
following way. In (2.2) the term x, is replaced by x,, and the term 3x, is replaced by 
3x,, so the net change in the check sum is 


=X + Xq4— 3X, + 3x5 =2(x5 — X4) (2.3) 
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This shows that the transposition error will not be detected if x, —x,=+ 5, since the 
check sum will then remain a multiple of 10. All other transposition errors will be 
detected, however. For example, if the barcode in Figure 2.1 was incorrectly read as 
5010161181710 (the digits x,=6 and x,=1 being transposed) the new check sum 
would be 


54+14+14+1+8+74+3(00+0+6+1+1+1)+0=50 


which is still a multiple of 10. Hence the error of transposing the fifth and sixth 
digits would go undetected (here x; — x,=6-—1=5). If, however, the eleventh and 
twelfth digits were transposed, giving 5010611181170 (here x,, -x,.=7—1=6), 
the new check sum would be 


54+14+64+14+8+14+3(0+0+1+14+14+7)+0=52 
which is not a multiple of 10, so the error is detected. 
EXERCISE 2.5 Suppose that the factor 3 multiplying the sum of the even-numbered digits in 


the check sum (2.2) is replaced by 4 or 5. Explain why in both of these cases not all single- 
digit errors would be detected. Investigate what would happen to transposition errors. 


@ EXAMPLE 2.6 
The United States postcode, known as the ‘Zip code’ consists of nine decimal 


digits. For example, a business return envelope for SIAM (Society for Industrial 
and Applied Mathematics) is shown in Figure 2.4. 


All 


FIRST CLASS MAIL 


Slat 


P.O. Box 7260 
Philadelphia, PA 19101-7260 


LovlHabevesel Messed ttlithithl 


Figure 2.4 


Introduction and Examples : 83 


The first digit represents one of 10 geographical areas, usually a group of 
States, from 0 in the northeast to 9 in the far west. The next two digits identify a 
mail-distribution centre; the next two represent the town, or local post office. The 
code was first introduced in 1963, and the last four digits (7260 in Figure 2.4) were 
added in 1983 to facilitate computerized sorting. The first two digits of the four-digit 
suffix represent a delivery sector (e.g. a group of streets) and the last two narrow 
down the area still further, for example one floor in a large office building. The 
barcode shown in Figure 2.4 actually includes a tenth digit, which is the check digit. 
The mathematics involved in interpreting the Zip code is interesting and straight- 
forward, and you are invited to explore this in Problem 2.1 at the end of the chapter. 


It is useful at this point to introduce a special language of what are called 
congruences. If a and b are integers and their difference a— b is a multiple of a third 
integer m (assumed positive), we say that a is congruent to b modulo m, and write 

a=b(mod m) 

In other words, there is another integer k such that 
a-b=km or a=b+km 


You are already familiar with congruences from everyday life. For example, if the 
hands of a conventional clock show 12 noon, then 80 minutes later the minute hand 
shows 20 minutes past the hour; that is, we read the minutes on a clock modulo 60. 
Similarly, 13 hours after noon the hour hand is against the digit 1, since we read 
hours modulo 12. In a similar fashion, calendars are used modulo 7 for days of the 
week — if 5 August is a Monday, then 12 August will also be a Monday. 


@ EXAMPLE 2.7 


We write 59=4(mod 11), since 59-4=5x11, and 37=-3(mod 10) since 
37 -(-3)=4x 10. 


EXERCISE 2.6 What time does a conventional clock (i.e. with hands) show 


(i) 17 hours after it shows 2 o'clock 
(ii) 80 hours after it shows 11 o’clock 
(iii) 40 hours before it shows noon? 


In Example 2.5 on the product barcode, we can now say that the check sum (2.2) 
is required to be 0(mod 10). 

Congruences have many properties which are the same as those for equalities. 
For example, if a and b are two integers such that 


a=b(mod m) (2.4) 
then we can add any other integer c to both sides to produce 


a+c=(b+c)(mod m) 
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since this means that (a+ c)—(b+c)=a-—b is a multiple of m. Similarly, we can 
subtract c from both sides of (2.4) to give 


a-—c=(b-—c)(mod m) 
and we can multiply both sides of (2.4) by c, giving 
ac = bc(mod m) 


since this means ac — bc = (a— b)c is a multiple of m. 

More care is needed when dividing both sides of a congruence (2.4) by an 
integer c. For example, we have 14=8(mod 6), but dividing both sides by 2 gives 
7=4(mod 6) which is not true, since 7 —4 is not a multiple of 6. However, division 
of both sides of (2.4) by c is valid provided c and m have no common factors. For 
example, for the congruence 14=2(mod 3) division throughout by 2 is now 
permissible since 2 and m (=3) are relatively prime, so this division gives the 
correct congruence 7 = 1(mod 3). 

Carrying out operations on integers in the way we have just described is called 
modular arithmetic. You must be thinking that we have drifted far away from our 
discussion of codes! In fact, what we have learnt is indeed relevant, as we now 
demonstrate. 


@ EXAMPLE 2.8 


Let's look again at the International Standard Book Number (ISBN), an example 
of which was displayed in Figure 2.2. In general an ISBN is a 10 digit codeword 
X Xp Xo. Xp Xo Which uniquely identifies a book. The first digit, x,, denotes the 
country (the UK, USA and some others have 0), x, x, is the publisher's number 
(e.g. 13 is Prentice Hall), the next six digits are the book number assigned by 
the publisher, and the last digit x,. is the check digit. In fact, there are 
variations: some countries are represented by more than one digit (e.g. 
Denmark is 87) and some publishers by more than two digits (e.g. 
Wiley-Interscience is 471), in which cases the assigned book number has less 
than six digits. The digits x, to x, can be any decimal digit from 0 to 9, but x; 
can also take the value 10, for which the Roman numeral X is used. The check 
digit x,) is chosen so that the check sum, which is defined to be 


10 
a 1X; = Xy + 2Xq + BXg + 4X, + 5X5 + Bxg + 7X7 + BXg + IXq + 10% 
isd 


is O(mod 11) — that is, it is a multiple of 11. Notice that we are using a 
congruence with m= 11, which is a prime number (i.e. has no factors except 1 
and 11). We shall study the ISBN in some detail in Section 2.5, where we'll see 
that the choice of (mod 11) rather than (mod 10) is crucial in endowing the 
code with the desirable properties of being able to detect all single-digit errors 
and all errors involving the interchange of two digits. We'll also find that 
modular arithmetic forms one way of constructing what are called finite fields. 
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@ EXAMPLE 2.9 


The identification number on money orders issued by the United States Postal 
Service consists of 10 digits x,... x,) and a check digit, each digit x, being 
allowed to take any value from 0 to 9. The check digit x,, is defined to be the 
remainder modulo 9 of the 10-digit number. To simplify the discussion, let's 
consider instead a five-digit codeword x, x, x,x,Xs, where the check digit x, is 
defined by the same rule, namely 


X, X_XqX, = Xs(mod 9) (2.5) 


For example, if the first four digits are 5370 then 5370=6(mod 9) so x,=6 
(since 5370 =9 x 596 + 6) so the codeword is 53706. However, if the 0 is replaced 
by 9, the congruence becomes 5379=6(mod 9) so that the check digit is the 
same, despite the error in x,. In other words, this error in x, cannot be detected 
- the substitution of 9 for 0 does not affect the value of the check digit x;. 
Clearly, the same applies if x, had originally been 9 and was replaced by 0. In 
fact, this is true for any of the first four digits - as another example, suppose x, 
was 0 and is replaced by 9, so that x,0x,x, becomes x,9x,x,. The change in the 
four-digit number is 900, which is 0(mod 9). By carrying on with this argument, 
you should be able to convince yourself that substitution of 0 by 9, or 9 by 0, in 
any one of x,, X,, X3 or x, goes undetected. This does not apply to the check 
digit x,, where any error will be detected since (2.5) will be violated. 

This code is therefore not a very good one, and it’s interesting to work out 
just how poor it is, by calculating what percentage of single errors will go 
undetected. First of all, we see that since each of x,, x2, x3, x, can take one of 
the 10 values 0 to 9, there are a total of 10* possible codewords (once x, x) X3Xq 
is fixed, x, is determined uniquely by (2.5)). Let’s count the number of words 
containing a single error. An error can occur in a single digit in nine possible 
ways. Hence there are 9 x 10‘ words which have the first digit incorrect, but the 
other four digits correct. The same applies for each of the other four digits, so 
in total there are 5x 9x 10*= 45x 10‘ words containing a single error. To count 
the undetected single errors, consider those occurring in the first digit. There 
are 10° codewords having 0 in the first position, which if replaced by 9 go 
undetected; similarly, if 9 is replaced by 0, altogether 2x 10° errors in the first 
digit are undetected. Since undetected efrors of this type can occur in any of 
the first four digits, the total number is 4x 2x 10°=8x 10°. Hence the detection 
rate for single errors is 


no. of detected errors _ 45x 10*-8x10° _ 442 
total no. of errors 45x 10° 450 


or approximately 98.2%. Thus about 1.8% of single errors are undetected. 

This code is much worse when it comes to errors where two consecutive 
digits are transposed. Suppose x, x, x; x,%; is incorrectly recorded as x, xX, X3 X4X5- 
The difference between the two four-digit numbers is 

Xq Xp XqXq — Xp Hy Xq Ny =X, 109 + x, 10? + x, 10 + xy — (x, 109 + x, 10? + x10 + x) 

= 900(x, - x) 


which is obviously divisible by 9, irrespective of the values x, and x,. Hence the 
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check digit determined from (2.5) is unaltered, so this error goes undetected. 
You should again convince yourself that this happens for transposition errors 
involving x, and x, or x, and x,. However, if x, and x, are transposed then the 
error will be detected because the congruence (2.5) will not be satisfied (see 
Exercise 2.7 below). Hence the only transposition errors which are detected are 
those involving the check digit. 


EXERCISE 2.7 Show that the congruence (2.5) is the same as 
X, + Xp + Xy +X, =x5(mod 9) 


(Hint: write x, x,x3x, = 10*x, + 107x, + 10x; + x4.) 

Hence show that if x,x,x,x,x; (with x,#.x;) is erroneously replaced by 
X,XpXsX5X,q then x, is not the check digit for x,x,x,x;, so this transposition error is 
detected. (For a generalization, see Problem 2.2 at the end of the chapter.) 


EXERCISE 2.8 The identity number on machine-readable passports is a seven-digit 
codeword x, x,X;X4XsX¢X7. The first six digits are the date of birth in the form 


XyX_ X3Xq A 5X 
day month year 


and the check digit x, is chosen so as to satisfy 
Xy + 7(X, + X4) + 3(Xy + Xs) +.X4 + .X6 =0(mod 10) 


Confirm that this code detects all single-digit errors. Investigate what happens to 
transposition errors, giving careful attention to the transposition of x, and x;, and of 
Xs and x,. 


2.2 HAMMING DISTANCE 


Let’s go back to the situation in Example 2.2, where we had four messages 00, 01, 
10, 11 to be sent. The difficulty was that if a single transmission error occurs then an 
incorrect message is received. This is because the messages are ‘too close together’ 
— an error in just 1 bit changes one word into another. We can make this idea precise 
by defining the Hamming distance, first suggested by R.W. Hamming in 1950. If 


a=4,a)...a,, b=b,b,...b, 


are two words each of length n, then the Hamming distance d(a, b) between them is 
the number of places in which they differ. 


@ EXAMPLE 2.10 


For a binary code of length 5 we have 
6(01010, 11001) =3 
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since a=01010 and b=11001 differ in the first, fourth and fifth places. Similarly, 
for a decimal code of length 4 we have 6(9172, 8272) =2, since the words differ 
in the first and second places. 


If we think of the two words a and b as being ‘points’ in space, then in order to 
‘travel’ from one word to the other we have to change precisely d(a, b) digits — this 
number of changes is what we call the Hamming distance. More precisely, this 
distance justifies its name since it has exactly the same three mathematical properties 
as the normal concept of geometrical distance between two points in space. Two of 
these properties are obvious: first, the distance between a and b is zero if and only if 
aand b coincide, that is 


(a, b)=0, if and only if a=b 


and otherwise d(a, b) > 0. 
Secondly, the distance from one word to another doesn’t depend on the 
‘direction’ of travel, that is 


6(a, b)=6(b, a), forall a and b 


The third property is called the triangle inequality for the following reason. Suppose 
we have three points A, B, C forming a plane triangle, as shown in Figure 2.5, with 
d(A, B) denoting the geometrical distance between A and B, and similarly for the 
other two sides. 

It is clear that 


d(A, B)< d(A, C)+ d(C, B) 


The corresponding result for the Hamming distance looks just the same: for any third 
codeword c= c;C, ... C, we have 


6(a, b) <d(a, c) + d(c, b) (2.6) 


Figure 2.5 
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To verify (2.6), we must realize that another way of thinking of the Hamming 
distance 6(a, b) is that it is the smallest number of changes to digits of the codeword 
a needed to produce b. For example, we saw in Example 2.10 that 
6(9172, 8272) =2, and the first two digits of a=9172 need to be changed in order to 
obtain b= 8272. In general, in order to change a into b we can first change it to c, 
which requires 6(a, c) changes, and then go from c to b, requiring a further d(c, b) 
changes. Since 6(a, b) is the smallest number of changes needed to go from a to b, 
it can’t be bigger than the sum of d(a, c) and d(c, b) — so we have proved (2.6). 

The use of the term ‘nearest-neighbour’ (NN) decoding, which we introduced 
earlier, can now be reinterpreted: we choose as the most likely transmitted word the 
one which is nearest (as measured by the Hamming distance) to the received word. 

You should by now have some feeling that a crucial parameter affecting the 
properties of a code is going to be the overall closeness, in the Hamming distance 
sense, of codewords. This is measured by the minimum distance d, which is the 
smallest value of all the distances between all possible pairs of (different) 
codewords. 


@ EXAMPLE 2.11 


For the code in Example 2.2 we can very easily work out that 
6(00,01)=1, 6(00,10)=1, 4(00, 11)=2 
6(01,10)=2, 65(01,11)=1, 6(10,11)=1 


This shows that the minimum distance is 1, and explains why the code is 
useless. For any code which has minimum distance 1, there will (by definition) 
be at least one pair of codewords a and b for which 6(a, b)=1. If ais sent, and 
an error occurs in that one particular digit in which a differs from b, then the 
codeword b will be received, which will be assumed correct — there is no way 
of telling that an error has occurred. Hence a code whose minimum distance is 
1 cannot even detect all single errors. 


EXERCISE 2.9 Determine the minimum distance d for each of the following binary codes: 


(a) C={1000, 1011, 0100} 
(b) C= {000000, 101010, 010101, 111001, 011110}. 


EXERCISE 2.10 For the code (b) in the previous exercise, determine the distances of each 
of the following received words from the codewords, and hence decode each of the 
received words using the NN principle: 


(a) 100010, (b) 000101, (c) 000110. 


However, if a code has minimum distance 2 and we transmit a codeword a, any 
single error will result in a received word which is not a codeword (since all 
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codewords differ from a in at least two places). Hence any single error will always 
be detected. What about error correction? Unfortunately, if d=2 there will always 
be at least one uncorrectable single error. To see this, suppose for simplicity that the 
code has length 4, and consider two codewords 


a=a,4,4,a,, b=a,b,a;b, 


with b, # a), by # dy, so that d(a, b)=2. Suppose a is transmitted and a single error 
occurs in a,, causing it to become b,, so the received word is c= a,b,a,a,. Clearly 
6(c,a)=1 and d(c, b)=1. Thus c cannot be a codeword, since all codewords are 
distance 2 at least apart, and hence an error is detected. However, there is no way of 
deciding by the NN principle whether a or b was sent, so the error cannot be 
corrected. You should have no difficulty in seeing how the argument still applies 
whatever the wordlength. 

We are beginning to see how we can quantify our intuitive feeling that the 
further apart codewords are (in the sense of Hamming distance), the better will be 
the code from the point of view of coping with errors. Let’s explore this with an 
example having d=3, before trying to get a general result. 


@ EXAMPLE 2.12 


We return to the repetition code, introduced in Example 2.2. Each 2 bit message 
was transmitted three times, giving the codewords 


a,=000000, a,=010101, a,=101010, a,=111111 


To find the minimum distance it is convenient to display the distances between 
pairs of codewords in the following tabular form: 


4 8 4 & 
a= 3 3 6 
@ 3 = 6 3 
a4 3 6 - 3 
4, 6 3 3 ~ 


The smallest number appearing in the table is 3, which is therefore the 
minimum distance for this repetition code. Notice that the table is symmetrical 
relative to the principal diagonal (top left corner to bottom right) — that is, the 
numbers in the first row are the same as those in the first column, the second 
row is identical to the second column, and so on. This is because of the 
property d(a, b)=6(b, a), so a table of distances for any code will always be 
symmetric. Only the upper triangular part need therefore be recorded. 


We saw that this repetition code will always correct a single transmission error, 
and in fact this is true for any code having minimum distance 3. This is a special 
case of the second part of the following important result. 
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Theorem 2.1 


Let C be a code having minimum distance d. 


(i) 


(ii) 


Proof 


(ii) 


C will detect e errors using the NN principle provided 
d>er+l (2.7) 
C will correct e errors using the NN principle provided 


d>2e+1 (2.8) 


From the definition of minimum distance, all codewords differ in at least d 
places, i.e. by (2.7), in at least e+1 places. Therefore, if a codeword is 
transmitted and at most e transmission errors occur, then the received word 
cannot be a codeword, Hence these errors are detected. 

Suppose a codeword a is sent and e errors occur, so that a word c is 
received for which 


O(a, c)=e (2.9) 
Let b be any other codeword different from a, so we have d< d(a, b), and 
therefore by (2.8) 

2e+1<<d(a, b) (2.10) 
Substitute (2.9) and (2.10) into the triangle inequality (2.6) to obtain 


2e+1<d(a, b) 
< 6(a, c)+ d(c, b) 
<e+d(c, b) 
for which we get d(c, b)> e+ 1. Thus b is a distance greater than e from 
the received word c — so a is the only codeword within a distance e from 


c. The NN principle therefore correctly decodes the received word to 
produce the original message a. 


Theorem 2.1 can be interpreted as saying that a code with minimum distance 
d can be used either to detect d—1 errors, or to correct 4 (d—-1) errors (if d is 
odd) and 4 (d-2) errors (if d is even). This agrees with what we have already 
discovered for d=1, 2,3. Notice that the proof is not in any way restricted to 
binary codes. It’s interesting that the Mariner 9 code mentioned in Section 2.1 
had length 32, 26 check bits and minimum distance 16, and so could correct up to 
seven errors. 


Linear Binary Codes 2 91 
@ EXAMPLE 2.13 


Consider the code 
C={00110, 10001, 01011, 11100} 


It is left as an exercise for you to check that the minimum distance is d=3. The 
theorem tells us that this code can either correct one error, or detect two. 

For example, suppose that 00011 is received. The distances from the four 
codewords are respectively 2, 2, 1, 5. Thus the received word is nearest to the 
third codeword, so we decide by the NN principle that 01011 was transmitted, 
with a single error in transmission (in the second bit). 

However, if 01101 is received then the respective distances from the four 
codewords are 3, 3, 2, 2 so we cannot decide by the NN principle whether the 
third or fourth codeword was transmitted. However, we have detected that 
there are two errors — for example, 01101 could have come from 11100 with 
errors in the first and last bits. 


EXERCISE 2.11 What is the smallest possible minimum distance that a code must have in 
order to correct two errors? How many errors will it detect? 


EXERCISE 2.12 A code has minimum distance 3. Show that it is not possible to correct all 
single errors and detect all double errors. That is, show that there exist codewords a 
and b and a received word c, such that c comes from a via one error, and from b via 
two errors. 


2.3 LINEAR BINARY CODES 


We now consider a binary code C consisting of codewords a= a,a,a; ... a,, where 
each element a; is 0 or 1. Define the sum of two codewords as c=a+b, where 
c,=a,+b;, i=1,2,...,n, that is we add the bits term by term, and apply the 
following rules: 


0+0=0, 1+0=1, O+1=1, 1+1=0 (2.11) 


These rules are in fact addition modulo 2 (which can be interpreted as 
even+even=even, odd+even=odd, odd+odd=even, where ‘0’ stands for an 
even number and ‘1’ for an odd number). 


@ EXAMPLE 2.14 


lf 1101 and 1001 are two codewords in a code of length 4, then their sum is 
1101 + 1001 =0100 
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Note that this is not binary arithmetic. If 1101 and 1001 were regarded as binary 
numbers instead of codewords, then their sum would be 


1101 
1001 


10110 


where we ‘carry over’ as in familiar decimal arithmetic. 


A linear code C is one in which the sum of any two codewords is also a 
codeword: that is, if a, b are in C then so is a+ b. In particular, taking b= a shows 
that any linear code always contains the zero word 0 (consisting only of zero bits) 
since the ith bit of a+ a is either 


a,+a;=1+1=0 or a,+a,=0+0=0 


@ EXAMPLE 2.15 


(a) The code C={00, 10,01, 11} is linear because all possible sums of 
codewords are also codewords, that is 
10+01=11, 104+11=01, 01+11=10 
00+10=10, 00+01=01, 00+11=11 


(b) The code C={0000, 0101, 1011} is not linear, because 


0101+ 1011=1110 
which is not a codeword. 


EXERCISE 2.13 Determine whether each of the following sets of codewords forms a 
linear code: 


(a) 000, 110, 100 
(b) 000, 100, 011, 111 
(c) 00000, 01110, 10111, 11001. 


An important reason for using linear codes is that there is an easy way of 
calculating the minimum distance. To find out what that is, we first need to define the 
weight w(a) of the codeword a as the number of ones in a. For example, 
w(1101)=3. If a and b are any two codewords belonging to a linear code, then in 
their sum c=a +b the ith bit c;= a, +, is 1 if a, and b, differ, and c;=0 if a; and b; 
are the same. Hence the weight of c is simply equal to the number of places in which 
a and b differ — in other words, we have proved that 


w(a+b)=0(a, b) (2.12) 


Linear Binary Codes : 93 
In particular, if b= 0 in (2.12) when we get 

w(a) = d(a, 0) (2.13) 
The key result can now be established. 


Theorem 2.2 


For any linear code C, its minimum distance d is equal to the smallest non-zero 
weight of codewords, that is 


d= min w(a) 
a#0 


= Wmin» SAY (2.14) 


Proof 


The argument is quite ingenious, and consists of showing that the minimum distance 
d is not greater than w,,,,, and also is not less than W,,i,; hence d is ‘sandwiched’ on 
both sides by w,,,, and so must be equal to it. 

First, let c be a codeword for which w(c)=W,i, (by definition, there must be at 
least one such c). Since 0 is also a codeword, by definition of minimum distance we 
have d< d(c, 0). From (2.13) we have d(c, 0) =w(c), so combining these two facts 
produces d<w,,i, (i.e. d is not greater than W,,,). 

Next, let f and g be two codewords which are distance d apart, i.e. d= d(f, g). 
However, the sum f+ g is also a codeword so from (2.12) we have 


Of, 8) = W(F+ 8) > Wmnin 


Hence d>w,,i,, and this is the second half of the ‘sandwich’ argument which shows 
that d= Wrin- 

Armed with Theorem 2.2, we can now see why it is easy to determine the 
minimum distance for a linear code. Instead of having to compute 6(a, b) for all 
possible pairs of codewords a and b, and then finding the minimum of these 
distances, we simply have to compute the minimum of the weights of all the (non- 
zero) codewords. 


M@ EXAMPLE 2.16 


It is left as an easy task for the reader to check that the code 
C={00000, 01110, 10001, 11111} 


is linear. By inspection, the non-zero weights are 3, 2, 5 respectively, so the 
minimum distance is 2. 
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EXERCISE 2.14 Verify that each of the following sets constitutes a linear code, and find 
the minimum distance: 


(a) 00000000, 10101010, 01010101, 11111111 
(b) 00000, 00111, 01011, 01100, 10011, 10100, 11000, 11111. 


2.4 MATRIX REPRESENTATION 
We would now like to be able to construct linear codes which correct all single 


errors, and to do this we need to utilize matrix notation. Recall from (1.59) how to 
multiply a matrix by a vector. For example, if 


then their product is 


We a,b, + a,b, + a,b; 
4 a,b, + asb, + agb; 


We shall only need binary matrices, which have elements which are either 0 or 1, 
and the arithmetic is carried out modulo 2, according to the rules in (2.11). 


@ EXAMPLE 2.17 


Using the multiplication rule above we have 
0 
1101714} 4/_[1+0+0+14+1]_/1 
01110), 0+0+1+1+0 0 
1 


EXERCISE 2.15 Compute the following products, using modulo 2 arithmetic: 
(a) 


1 
1010||/0 
1111]/0 

1 


00111 
00101 
01111 


(b) 


Be HOF 
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Let a=a,a,... a, and b=b,b, ... b, be any two codewords belonging to a code 
we are trying to construct, and suppose H is a binary matrix with n columns. Let’s 
write the codewords as column vectors, for which we shall use the notation 

a b, 
a by 
a@=|a,|, b'=|b; 
a, b, 
This is not to be confused with the transpose notation [a,, d, ..., a,)" which turns a 
row vector into a column vector. 

The crucial ploy is to choose the matrix H so that codewords satisfy the 
equations 

Ha'=0, Hb'=0 
Clearly if c=a+b, then 

Hc' = H(a' + b')=Ha' + Hb'=0 
which shows that c is also a codeword. But since c is the sum of a and b, this means 
that the way we have set things up ensures the code is linear. That is to say, we have 
characterized our linear binary code as the set of all codewords x=x,x,x; ... X, 


which satisfy Hx' =0. The matrix H is called the parity-check matrix, or simply the 
check matrix. 


@ EXAMPLE 2.18 


Suppose that we take as check matrix 


To determine the codewords x= x, x,X, for which H is the check matrix, the 
simple-minded approach is to take all the possible binary words of length 3 and 
see which of them satisfy Hx’ =0. Since each x; can be 0 or 1, there are 2°=8 
possible words, namely 


000, 100, 010, 001, 101, 110, 011, 111 


By direct multiplication we get 


Ben Al} Alb BE) 
mal-(3} Male(ah Malls faf(8] 
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Only the first and last of these products give the zero vector, so we conclude 
that the only two codewords defined by this particular H are 000 and 111. Not a 
very interesting code — we can only send the messages ‘yes’ and ‘no’! 

A more systematic way of finding the codewords is illustrated by the next 
example. 


EXAMPLE 2.19 


Let's find all the codewords for the linear code determined by the check matrix 


-|1001 
H={) 11 7 ae) 
This requires us to solve 
x 
% |_| 0 
ae -[0| : 
X 


which when written out in full becomes 


x+x%=0 (2.16) 
%+%y+%=0 ‘ 
These are called the check equations. The first one gives 
Xj hy =X (2.17) 


Notice that we can write -x,=x, because with arithmetic modulo 2, as 
expressed in (2.11), then 


1+1=0, 0+0=0 


which is equivalent to stating that -1=1, -0=0. Thus for any binary x we have 
-x= x, and so the second equation in (2.16) gives 
Xq = = Xq— Xy = Xt Ky (2.18) 


We have now expressed x, and x, in terms of x, and x,, which can be regarded 
as the two independent variables. Since x, and x, can take the values 0 or 1, 
there are four possibilities which can be represented in tabular form as follows: 


| 


Chey 
eu ! 
Ghent | 
1 Or 
For each pair of values of x, and x,, the corresponding values of x, and x, are 


obtained from (2.17) and (2.18). For example, when x,=1, x,=1 then x,=1, 
xX, = 1+1=0. The four codewords are therefore 0000, 1101, 0110, 1011. 
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EXERCISE 2.16 Determine all the codewords for the linear code having check matrix 


0 
0 
1 
0 


Koco 
oror 


0 11 
1 5 
0 WS 
0 01 


We are now ready to develop a general procedure for constructing a linear code. 
The equations (2.16) in Example 2.19 were easy to solve because x, appeared only in 
the first one, and x, only in the second one. This is because the first two columns of 
H in (2.15) consist of /,, the 2x2 unit matrix. Similarly, for the matrix H in 
Exercise 2.16 the first four columns consist of /,. In general, we can take as check 
matrix the r x n matrix 

H=[I, A] (2.19) 


which has r rows and n columns. In (2.19) /, is the rx r unit matrix (defined in 
Section 1.4, Chapter 1) 


having ones on the principal diagonal and zeros elsewhere, and A is an arbitrary 
rx(n-—r) binary matrix whose element in row i, column j, we denote by aj. 
Codewords x, x, ... x, have length n and satisfy the condition Hx' =0. For simplicity 
let’s write out the case r=3, n=S: 


100 ay ap |] x 
010 ay, ayy || x3 |=0 
00 1 ay; ay. |} X% 
1; A |X5 
We can write the check equations as 
x, + Ay X%4+ AyyX5=0 
X_ + Ay) X%4 + AyX5 =0 
X3 + Ay, X4 + Ay.X5=0 
As before, because we are using arithmetic modulo 2 we can rewrite these as 
Xy = Ay Xq + Ay2Xs 
Xz = Ay X4 + Ay2Xs 


X3 = G3, X4 + AyXs 
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or in matrix notation 


x : 
x, |=Al 4 (2.20) 
x. *s ‘ 

3 
The variables x, and x, are the independent or information bits which we can choose 


arbitrarily (depending on the message to be transmitted). The check bits x,, x», x; are 
then determined uniquely from (2.20). The general version of (2.20) is 


% X41 
a 
%2 | = al *r+2 (2.21) 
x oa 
and there are n—r information bits x,,,, X,,2,-..,X, and r check bits x,, x,,..., X,. 


Since each bit is 0 or 1, there are 2"~’ different ways of choosing the values of the 
information bits, so there are 2"~' codewords in total. For this reason the quantity 
k=n-—ris called the dimension of the code. Notice that if the order of the columns 
of H in (2.19) is altered, then this simply alters the order of the bits in the codewords 
in the corresponding way. In particular, some books use H=[A_ /,]. 


@ EXAMPLE 2.20 


(a) Go back to Example 2.19. We see that in (2.15) 


= _|01 
=th Al, a-|9 1] 


and n=4, r=2, so the code has dimension 2 and 2?=4 codewords. 
From (2.21) we obtain 


x |_ | 0 1]} x 
MA | VA | xg 
giving the expressions (2.17) and (2.18). 


(b) Let's determine the codewords for the linear code having check matrix 
100101 7 


=/0101110 
0011101 


ly A 
Here r=3, n=7 and the check equations (2.21) are 


= Xy+ Xet+ % 
Xp = Xa + Xt Xp (2.22) 
Xq = Xq + Xt XH 
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The dimension of the code is 7-3=4, so there are 2*= 16 codewords. 
As in Example 2.19, for each possible set of values of the information 
bits the values of the check bits given by (2.22) can be conveniently 
expressed in tabular form, and six of the codewords are listed below: 


Check bits Information bits 
x % X Xs Xs x x 


sa 3153 6 
oos-320 
o-0--+0 
=csoo30 
-=co-00 
co-c000 
o-coo°o 


EXERCISE 2.17 Determine the remaining 10 codewords in the preceding example by 
using (2.22). Verify that the codewords satisfy Hx' =0. 


Our next task is to investigate the error detection and correction properties of a 
code in terms of its check matrix. Recall that for a code to be able to detect all single 
errors it must have a minimum distance d of at least 2. In view of Theorem 2.2, this 
means that there must be no codewords of weight 1, since d is equal to the minimum 
of all the weights of (non-zero) codewords. Suppose e is a word of weight 1, and so 
has just a single non-zero bit in (say) the ith position. Hence for e not to be a 
codeword, we must have He’ + 0. However, because e’ is a column vector with a 
single 1 in the ith element, the vector He' is just the ith column of H, and this must 
therefore not equal zero for any value of i from 1 to n. We have therefore proved 
the following theorem. 


Theorem 2.3 


H is the check matrix for a single-error-detecting linear binary code if and only if it 
does not contain a zero column. 


@ EXAMPLE 2.21 


If the check matrix is 


=/1010 a 
tare 


then because the fourth column of H is zero, a codeword of weight 1 is 0001 (the 
1 is in the fourth place). Hence d= 1, and the code does not detect single errors. 
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EXERCISE 2.18 Theorem 2.3 states that any r xn binary matrix without a column of 
zeros will be the check matrix for a single-error-detecting code. Suppose that H is 
taken to be in the form (2.19), and write down a suitable H with four check bits and 
three information bits. Determine all the codewords and the minimum distance (notice 
that this can tum out to be greater than 2). 


Now let’s move on to correction of single errors. We recall that in this case we 
must have d> 3, so that there must be no codewords having weight 2; that is, for any 
word f having exactly two non-zero bits in positions i and j (say) we must have 
Hf' #0. But the product Hf" in this case is equal to h, + h;, the sum of the ith and 
jth columns of H. For example, if n=5, i=2, 7=4 we would get 


0 
1 

Hf' =H| 0|=h +h, 
1 
10) 


We therefore require h, + h, + 0, or h, # h, (since —h, = h,). Clearly the condition of 
Theorem 2.3 for d>2 must also hold, so we have proved the following theorem. 
Theorem 2.4 

H is the check matrix for a single-error-correcting (s.e.c.) linear binary code if and 
only if no two columns of H are equal, and no column is zero. 

@ EXAMPLE 2.22 


It is now very simple to write down a check matrix which will produce an s.e.c. 
code. For example, if there are three check bits and three information bits then 
n=3+3=6, anda suitable check matrix in the form (2.19) is 


100110 
Onion 4 
001101 
This choice is not unique; for example, any of the columns could be replaced by 


1 
0 
1 


The only requirement is that all the columns are non-zero and different from 
each other. 
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If, however, we consider 


100110 
H=|010101 


001101 


then we see that the first and fifth columns are identical. Hence 100010 is a 
codeword of weight 2, confirming that this second matrix H cannot produce an 
s.e.c. code since d<3., 


EXERCISE 2.19 Confirm that 


10010 
01011 
00110 


A= 


cannot be used as the check matrix for an s.e.c. code by (a) writing down a codeword 
of weight 2, and (b) determining two non-zero codewords which are distance 2 apart. 


Let’s now look more closely into the structure of check matrices which produce 
s.e.c. codes. If there are only two check bits then the only possible check matrix 
satisfying the conditions in Theorem 2.4 is 


SG [A PRAM 
a=(9e 4] (2.23) 


(apart from a permutation of the columns, which as we have remarked earlier merely 
permutes the order of the bits). If r=3, then we have from (2.19) 

H=[I, A] 
To satisfy the conditions of Theorem 2.4 we must exclude the zero column and the 
columns of /, from A, so the only possibilities for the columns of A are 


1 10) 1 
ilate he alg! 
Ove lends al 


By selecting one, two, three or all four of these columns we produce codes of lengths 

4, 5, 6 or 7 respectively. In general, with r check bits and length n we have from (2.19) 
H=[f, A Jr (2.24) 

r (n-r) 

There are in total 2’ possible columns to select for A, since each of the r elements in 

a column can be 0 or 1. However, in order to obtain the s.e.c. property we must 

exclude the zero column and the r columns of J, from A. We are therefore left with 

at most 2’— r—1 columns for A. Looking at (2.24), we see that 


n-rs2'-r-1 


showing that the length of codewords satisfies the condition n<2’—1. When 
equality holds the code is called perfect. 
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@ EXAMPLE 2.23 


When r=2 we have n<2?-1=3, as is apparent in (2.23). When r=3 we get 
ns2—1 =7, which again confirms the discussion above. 


EXERCISE 2.20 For linear s.e.c. codes, how many check bits are needed with (a) 20 and 
(b) 32 information bits? 


EXERCISE 2.21 Using a suitable check matrix, list the codewords for an s.e.c. code with 
two information bits and the smallest possible number of check bits. 


EXERCISE 2.22 A first-year class in the School of Humorous Studies contains 59 students, 
and it is decided to assign to each an identity number in the form of a binary word. 

(a) What is the least possible number of information bits of a linear code used 
for this purpose? (It is assumed that some codewords will be left over 
unused. ) 

(b) If the code must be capable of correcting all single errors, find the least 
possible length of codewords. 

(c) Write down any suitable check matrix for such a code, 


We now know how to construct an s.e.c. code for given numbers r and k of check 
bits and information bits respectively by choosing any suitable check matrix H in the 
form (2.24). In order to encode a given information message, we obtain the r check 
bits from (2.21). However, to decode we have so far simply compared a received word 
m, say, with the set of all codewords. By appealing to the NN principle we then select 
the codeword nearest to m as that most likely to have been transmitted. This is a tedious 
procedure, and in fact a much simpler decoding procedure can be developed using the 
check matrix H. Suppose that the received message is m=c+e, where c is a 
codeword, and e represents a single error in the ith bit, so 


e=00...010...0 
if} 
i 
By definition of the check matrix we have Hc' =0, so 
Hm' = H(c'+e') 
= He' 


However, because the column vector e’ formed from e has a single 1 in the ith 
element, we saw earlier that He’ is just the ith column of H. We have therefore 
established for our s.e.c. code the following theorem. 


Theorem 2.5 


If a single error occurs in transmission then Hm' is equal to some column (the ith, 
say) of H; and the error is in the ith bit. 


Matrix Representation “ 103 


Because of its role in determining the error, the vector Hm’ formed by 
multiplying the check matrix by the vector of the received word is called the 
syndrome of m (derived from the medical usage of this word where it means 
‘symptom’ ). 


Syndrome decoding algorithm 

Step 1 Compute s=Hm'. 

Step 2 If s=0, assume m is a codeword and no transmission error has occurred. 
Step 3. If s=ith column of H, a single transmission error occurred in the ith bit. 


Step 4 If s#0#ith column of H then more than one error occurred in 
transmission. 


@ EXAMPLE 2.24 


Consider the matrix 


which is the check matrix for an s.e.c. code, since it satisfies the conditions of 
Theorem 2.4: all columns are non-zero and different from each other. Suppose a 
received word is m= 11110; then the syndrome is 


which is the second column of H. We deduce from Step 3 of the algorithm that 
there is an error in the second bit, so the correctly decoded message is 
c= 10110. It is easy to check that Hc’ =0, verifying that cis indeed a codeword. 

Suppose that a second received word is m=00111. In this case the 
syndrome is 


which is not a column of H, so by Step 4 we conclude that more than one error 
has occurred in transmission. In fact, as the reader can check, in this case 
possible transmitted codewords are 10110, with errors in the first and last 
places; or 01011 with errors in the second and third places; or 11101 with errors 
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in the first, second and fourth places. Our decoding algorithm cannot determine 
the transmitted codeword in this case. 


EXERCISE 2.23 Consider the code having check matrix 


10100 
11010 
01001 


(a) A codeword is transmitted, and a single error occurs in the second bit. What 
is the syndrome? 
(b) Decode each of the received messages: 11110, 11101, 10111. 


EXERCISE 2.24 Consider the code with check matrix 


(a) Decode the received words 111001, 110111. 

(b) Show that if 111111 is received then more than one error has occurred in 
transmission. Find two possible codewords which could have been 
transmitted with two errors occurring, and a codeword which could have 
been transmitted with three errors. 


EXERCISE 2.25 If a codeword is transmitted, and errors occur in bits i and j, show that 
the syndrome is the sum of columns i and j of the check matrix. 


EXERCISE 2.26 If H is not the check matrix for an s.e.c. code, then of course the 
decoding algorithm breaks down. Verify this by considering the check matrix 


and a received word 01110. 


2.5 HAMMING CODES 


An important family of s.e.c. codes was discovered by the American mathematician 
and computer scientist Hamming around 1950. 


M@ EXAMPLE 2.25 


Suppose we wish to construct a check matrix H for an s.e.c. code with length 
n=6 and three check bits. One natural way of selecting columns of H so that 
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they satisfy the conditions of Theorem 2.4 is to write down the binary number 
representation of the integers 1 to 6, namely 


001, 010, 011, 100, 101, 110 
1 2 3 4 5 6 


Obviously these binary numbers are all different from each other and none is 
zero. Therefore, if we write them as the columns of a matrix 


000111 
OPT) 101851 (2.25) 
101010 


this does indeed comply with the requirements in Theorem 2.4, and so is the 
check matrix for an s.e.c. code. Notice that H in (2.25) no longer has the 
standard form (2.24) where the first three columns are those of the 3x3 unit 
matrix /,. However, this doesn’t matter — it simply means that the check bits are 
in the positions where the columns of /, appear in (2.25), namely positions 1, 2 
and 4, 

Recall (Theorem 2.5) that if mis a received word and the syndrome s= Hm’ 
is equal to the ith column of H, this shows that there is a single error in the ith 
bit. However, because of the way we have constructed H, its /th column is just 
the number jin binary form, so the syndrome s itself has given us the bit which 
is in error, without having to compare s with the columns of H. This is the 
clever idea behind Hamming codes, which makes them particularly easy to 
decode. For example, if m=101001 then using the matrix H in (2.25), the 
syndrome is 


This corresponds to 100 (we use the notation s~ 100) which is the binary 
representation of 4. Hence the error is in the fourth bit — the transmitted 
codeword is therefore 101101. Leaving out the check bits x,, x, and x,, the 
information message is therefore 101. 

To obtain the codewords x= x, x,X;X,XsX%_ we use the check equations 
Hx' =0, which by (2.25) become 


X= Xet+ Xe, Xp = Xyet Xe, -Xy = Xt Xs 


Following the procedure described in Examples 2.19 and 2.20, by taking all 
possible combinations of the information bits x,, x, and x, we can construct 
the whole set of codewords. For example, if x;=1, X;=1, X,=0 then 


X,=1+1=0, %=14+0=1, x,=1+0=1 


and the codeword is 011110. Since each information bit can be independently 0 


106 Supermarket Barcodes, Pictures From Space, Compact Discs 


or 1, there are 2° possibilities, and we get the following table: 


Check bits Information bits 
x; X Xs X3 Xs Xe Weight 


eo--00=--=0 
o-0--0-0 
oof 4-s--400 
=2=e--00-0 
=s20 = 0 =.:6 oO 
4=3-30-000 
Orrpaprwwo 


The weight of each codeword is indicated in the final column. Since the 
smallest non-zero value is 3, Theorem 2.2 tells us that for this Hamming code 
with r=3, n=6 the minimum distance is d=3. Hence (by Theorem 2.1) this 
confirms that the code corrects all single errors. 

Notice, however, that in this example if m=000111 then the syndrome is 


1 
1)~111 
1 


which is the binary representation of 7 - but the code has length only 6, so 
there cannot be an error in the seventh bit! What this means is that more than 
one transmission error has occurred. 


s=Hm'= 


Let’s now look at the properties of a general Hamming code having length n, 
dimension k and r=n-—k check bits (it is called an [n, k] code). 


(i) The rxn check matrix H has as its ith column the number i written in 
binary form, with i= 1, 2,..., n. 
(ii) Since the columns of H satisfy the conditions of Theorem 2.4, it is the 
check matrix for an s.e.c. code. 
(iii) The check bits are in the positions where the columns of H contain a 
single 1, that is positions 1, 2, 4,8, ...,2’7'. 
(iv) For n>4, the first few columns of H are 


00000... 
00000... 
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Starting from the bottom row, these correspond to the check equations 
Xp SXgtAs ter, AyHAy torr, AXyHAst-- 


When we construct the table listing the codewords, then the particular 
choice x, = 1, all other information bits zero, gives x, =1, x, = 1, all other 
check bits zero. 

Hence 111000...0 is always a codeword, with weight 3. We knew 
already that d>3 since the code corrects all single errors, so we have now 
shown that Hamming codes have minimum distance exactly equal to 3. 

(v) As we saw in the discussion following Theorem 2.4, the length n of any 
s.e.c. linear binary code is at most 2’— 1. When n=2’—1 the Hamming 
code is called perfect, otherwise it is shortened. 

(vi) For a perfect Hamming code, every syndrome (except 0) occurs as a single 
column of the check matrix, and so represents a single correctable error. For 
a shortened Hamming code, some syndromes represent multiple errors. 


@ EXAMPLE 2.26 


To construct the perfect [7,4] Hamming code, we append onto the matrix H in 
(2.25) an extra column obtained from the binary representation 111 of 7. This 
produces the new check matrix 


eitoon 


H=|0110011 (2.26) 


1010101 


If a received message is m= 1111001, then the syndrome is 


0 
s=Hm'=|1|~ 011 


1 


Since 011 is the binary representation of 3, we deduce that there is an error in 
the third bit, so the transmitted codeword is 1101001. Discarding the check bits 
in positions 1, 2, 4 leaves the decoded information message as 0001. 


EXERCISE 2.27 
(a) Give the check matrix for the Hamming code with six information bits and 
four check bits. 
(b) Encode the information messages (i) 101101, (ii) 001011. 
(c) Decode the received words (i) 0001110101, (ii) 0000101100, 
(iii) 1011110111, giving if possible the information messages which were 
sent. 


EXERCISE 2.28 Starting with the check matrix in the previous exercise, write down the 
check matrix for the perfect Hamming code with four check bits. Decode the received 
word 101111011100000. How does this compare with (iii) in part (c) of the previous 
exercise? 
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An interesting geometrical representation can be given for the perfect Hamming 
code with three check bits, whose check matrix is set out in (2.26). This takes the 
form of a Venn diagram, shown in Figure 2.6. The information bits x;, x5, 5, x7 are 
located in the four central compartments of the diagram, indicated with hatched 
lines. The check bits x,, x,, x, are in the outer compartments. The check equations 
Hx' =0 with H in (2.26) are 


X4tX5+X%_ +x, =0 
X_ + X%yt+X%p+xX,=0 


Xy +3 +X5 +x, =0 


However, the sum of the bits inside circle A in Figure 2.6 is precisely 
X4 + X5 + X5 +7, which by the first check equation is zero; in other words, there must 
be an even number of ones inside circle A. You can easily confirm that the same 
thing holds for circles B and C, using the second and third check equations 
respectively. We can use this Venn diagram, in which each of the circles has even 
parity, both for encoding and decoding. For example, if the information message is 
X3XsXeX7 = 1011 then we obtain Figure 2.7(a). It is easy to see that the only way for 
each circle to contain an even number of ones must be as shown in Figure 2.7(b). 
Hence x, =0, x, =1, x, =0 and the codeword is 0110011. 

Suppose now that this codeword is transmitted, and an error occurs in the third 
bit. The received word 0100011 is shown in Figure 2.8. Both circles B and C contain 
an odd number of ones whereas A contains an even number of ones. We therefore 
deduce that the error lies at the intersection of circles B and C only. From Figure 2.6 
we see that this intersection is x,, so we conclude (correctly) that an error has 
occurred in the third bit. 


Figure 2.6 
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B A B A 


fo} 
a 


(a) (b) 


Figure 2.7 


Figure 2.8 


Notice, again by referring to Figure 2.6, that a single error in one of the check 
bits x,, x, or x, is immediately recognized, since only one of the circles has its parity 
upset. 


EXERCISE 2.29 Repeat the decoding problem in Example 2.26 using the Venn diagram 
representation. 


EXERCISE 2.30 Use the Venn diagram representation to construct the set of codewords 
for the perfect [7,4] Hamming code. Suppose that the codeword 0011001 is 
transmitted, and that errors occur in the second and sixth bits. Investigate what 
happens if the Venn diagram method is used to decode the received word. 


An important application of Hamming codes is to improve dramatically the 
reliability of computer memories. Unavoidable errors occur in individual storage 
cells owing to stray radiation, which arises because the plastic packaging of memory 
chips contains small amounts of radioactive materials. For example, consider a 
memory bank consisting of 128 silicon 64K chips. Each chip contains 64x 2'°= 
65 536 data storage cells, so that there is a total of 128 x 65 536=8 388 608 such 
cells; in other words, the memory holds about a million 8 bit words (actually, by 
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today’s standards for computer memories this number is quite small). Each memory 
cell in a 64K chip is incredibly reliable, having a mean life before failure of over a 
million years. Unfortunately, because of radiation which can alter the contents of an 
individual memory cell, the mean time before failure of the memory bank itself is 
about 


1 million years + 8 388 608 = 43 days 


Such a short failure time is complete unacceptable for practical purposes, but 
there is no feasible way of shielding the computer memory from radiation damage. 
The solution to this problem is provided by incorporating a Hamming code 
(actually a [64,57] code with seven check bits) into the memory bank — this 
requires about 20% more capacity. This enlarged memory would be even more 
prone to errors (as there are more cells to be hit by radiation), the mean time to 
failure now being 43/1.2=36 days. However, thanks to the single-error correction 
provided by the Hamming code it can be worked out that the frequency of errors 
in the memory comes down to about once every 63 years (yes, years!). This 
remarkable improvement again illustrates that the wonders of contemporary 
electronics rely heavily on the mathematician’s crucial contribution of error- 
correcting codes. 


2.6 DECIMAL CODES 


After concentrating on binary codes, it’s time to take a closer look at decimal codes, 
which were introduced in Section 2.1. Let’s return to the International Standard Book 
Number (ISBN), discussed in Example 2.8. Recall that this consists of a 10 digit 
codeword x, xX ...X\9 Which uniquely defines a book. The last digit x,) is the check 
digit. This is chosen so that the check sum, defined by 


10 
S=)o ix =H, + Ixy + 3xy +++ + Ixy + 10%, (2.27) 
i=1 
which is called the weighted sum of the digits, is a multiple of 11, that is 
S=0(mod 11) 


The digits x,,x,,..., %) can take any of the values 0, 1,2,...,9 but the check digit 
X,9 can in addition take the value 10, which is denoted by the Roman numeral X. 
Setting the sum in (2,27) equal to zero shows that 


—10X,) =x, +2x, +--+ + 9X, 


and because our arithmetic is modulo 11 then —10x9 = x,9 (since 11x,) =0). Hence 
for a given book number x, ... x) the check digit can be computed from 


9 
X19 = >, (mod 11) (2.28) 


i=1 
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A simple way of evaluating the check digit in (2.28) is suggested in Problem 2.8 at 
the end of the chapter. 

In order to test whether a given book number is correct, that is a valid ISBN for 
a book, a simple tabular method was devised for use by librarians. The easiest way to 
understand this is to first consider a numerical example. 


@ EXAMPLE 2.27 


Consider the ISBN in Figure 2.2, namely 0-19-859665-0. Note that the hyphens 
are inserted purely for convenience in breaking up the book numbers into 
blocks, and have no mathematical significance. Write the book number in a 
vertical column denoted by c,, and construct two columns c, and c, to the right 
as shown in Table 2.1. 


The rules for constructing the table are as follows: 


(i) The three entries in the first row are identical. 
(ii) Suppose that at some stage the entries in a row are a,, 4, a. Then 
the row below it,b,, b,, bs, is obtained from 
a a a3 


an) 


L 
b, b,=b,+@, bz =b,+ a3 


The application of rule (ii) is indicated by arrows in Table 2.1. 
The ISBN is correct if the last entry is a multiple of 11 — that is 0(mod 11) - 
so this final entry (circled in Table 2.1) represents an alternative check sum. 


Table 2.1 


8 

5 23 52 

9 32 84 

6 38 122 

6 44 166 

5 49 215 

0 49 @64 Check sum=24x 11 
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Notice that the effort involved in constructing Table 2.1 can be reduced by 
performing each individual addition modulo 11. For example, the fourth row of 
8, 18, 29 becomes 


8, 8+10=7, 7+11=7 (mod 11) 
and the next row is 
5, 5+7=1, 1+7=8 


You should check that the complete new table is as follows: 


CaAMRMoawWo-0o0 
A 

anono-1o-0 

OCO--NONO-0 


Clearly there is no difficulty in doing all the arithmetic mentally, unlike (2.27) 
which usually requires the use of a calculator. The ISBN is correct if the last 
entry in the table is zero. 


We can now see what happens in general. Apply the rules (i) and (ii) to an 
arbitrary ISBN x, x2 ... X to produce the following table: 


ee 


Cy C2 Cy 

x x x, 

Xz X +X 2x, +X, 

X3 Xy +X +X 3x, +22. +X 


X% Xy + Xq + Xy+Xq 4x, +3x,+2x5 +X, 


X10 Xy +X tor + Xo T = (10x, + 9x, + 8x3 + +++ + 2% + X19) 


The last entry T in column c, can be written as 
TH 11 (x, +g +0 +X +X) -—F 


where S is the check sum defined in (2.27). You can now see why T can be used as 
an alternative form of check sum, since clearly it is a multiple of 11 if and only if S 
is a multiple of 11. 
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EXERCISE 2.31 Test whether the following are correct ISBNs: 
(a) 0-87-150702-2, (b) 0-13-152447-X. 


EXERCISE 2.32 Use (2.28) to determine the check digit for an ISBN whose first nine 
digits are 039330711. A simplification of the procedure is given in Problem 2.8. 


The ISBN code detects all single errors, for in this case the check sum S in 
(2.27) is not a multiple of 11. To see this, suppose that our proposed ISBN is 
X,Xp «+ Vp «++ Xo, Where the pth digit is y, =x, + e, where e + 0 is the error. Then the 
check sum S is 


S =x, +2x, + 3x, +--+ + py, +++: + 109 
10 
= > ix; + pe 
i=1 
= pe(mod 11) (2.29) 


since x, ... Xj is a correct ISBN. The crucial fact that we now use is that since 11 is 
a prime number, there are no non-zero integers p and e such that their product is a 
multiple of 11. Hence S in (2.29) cannot be O(mod 11), so the error has been 
detected. In fact, if it is known which digit is in error, then we can actually correct it. 
Again, this is best seen from a numerical example. 


@ EXAMPLE 2.28 


Suppose that the fifth digit in the ISBN in Example 2.27 has been accidentally 
obliterated, so that it is recorded as 0198x96650. We wish to determine x. We 
construct the check sum, either using the tabular method, or directly from 
(2.27). The latter gives 
$= 1.0+2.14+3.9+4.8+5.x+6.9+7.6+8.6+9.5+ 10.0 

= 250+5x 

=8+5x(mod 11) 
and the required value of x is that which makes S a multiple of 11. Simply 
by trying successive values of x=1,2,3,... we find that x=5 gives 
S=33=0(mod 11). 


EXERCISE 2.33 An ISBN is received with one digit illegible, namely 09481014786. 
Determine the missing digit. 


If a received ISBN is found to be incorrect and it is not known which digits are in 
error, then there will be many possible correct ISBNs. Even if we make our usual 
assumption that a single error is the most likely occurrence we will not be able to 
correct it. 
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EXERCISE 2.34 An ISBN is received as 0-471-62187-3. 


(a) Show that it is incorrect. 

(b) Determine the correct check digit, assuming the other digits are correct. 

(c) Assume that the original check digit is correct, but there is a single error in 
one of the digits x,, x;,..., X9. Determine three possible ISBNs. 


It should be stressed that in Example 2.28 and in problems like Exercise 2.33 a 
unique answer is always obtained. This is because the set of integers modulo 11 
which we use for the ISBN is an example of what is called a finite field, or Galois 
field, after the French mathematician who was killed in a duel in 1832 at the age of 
only 20 (he was fighting not about mathematics, but over a matter of ‘honour’!). 
Without becoming technical, for our purposes we can think of a finite field (of order 
N) as a set of N elements which it is possible to add, subtract, multiply and divide 
(but division by 0 is not defined), The notation GF(N) is commonly used. Every 
element a in the field has an additive inverse 8, which means that there is another 
element # such that a+f=0, Every non-zero element a has a multiplicative 
inverse, which means that there is an element y in the field such that ay=1. We 
write —a for B, and a~' for y. 


@ EXAMPLE 2.29 


We mentioned in Section 2.1 that modular arithmetic provides a way of 
constructing finite fields. Consider the set of integers 0, 1,2,...,10 where we 
perform arithmetic modulo 11. This set of 11 numbers forms GF(11). Addition 
and subtraction are as described earlier. The additive inverse is easily handled, 
for example 


9+2=0(mod 11), so-9=2 


The multiplicative inverse requires a little more thought. For example, to find 
this for the integer 9, simply evaluate 


9x1=9, 9x2=18=7(mod 11), 9x3=27=5(mod 11), 


until we find a product which is 1(mod 11). Clearly 9x 5=45=1(mod 11), so we 
have found that 9-'=5. 


EXERCISE 2.35 For each of the remaining 10 members of the set of integers modulo 11, 
determine the additive and multiplicative inverse. 


EXERCISE 2.36 For any non-zero element a in a finite field F there exists a multiplicative 
inverse a~' such that a~'a=1. Use this to prove that if a, b are any elements in F 
then ab=0 implies either a=0 or b=0 (you may assume that the usual distributive, 
associative and commutative laws apply for finite fields, namely a+b=b+a, 
ab=ba, (a+b)+c=a+(b+c), (ab)c=a(bc) and a(b+c)=ab+ac). Note that 
this result would not apply, for example, to the set of integers modulo 10, since 
5x2=0(mod 10). 
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In fact, finite fields can consist of sets of polynomials, as well as of integers, but 
exploring this subject any further is beyond the scope of this book. It is most 
interesting to realize, however, that what was once thought to be a purely abstract piece 
of mathematics is now a crucial tool in the development of error-correcting codes. 

In transmitting or recording information in digital form, a common error is to 
interchange (or ‘transpose’) accidentally two digits, usually neighbouring ones. We 
now show that the ISBN can detect any double error of this type. Suppose that 
X; -.. Xjq is a correct ISBN but the received number is 


Cap ene AYES} 


seri e10 


x, and x, interchanged in error 


The check sum (2.27) now becomes 
SH 1x, + 2xy to ty tee + ky to + 10% 9 (2.30) 
In order to relate this sum to (2.27), add and subtract terms kx,, jx; to (2.30) as follows: 
S = 1x, +x toe + jx, — by, + hoy + + eG — i tj to + 10x49 
= Ux + 2x to + (7 — Wy + ey to KP ti te + 10x; 


10 
= i+ - Dy + k-DyH (2.31) 


i=l 
d= (j— ky, — x)(mod 11) (2.32) 


where the first term in (2.31) is O(mod 11) because x, x, ... Xj is a correct ISBN. 
The expression in (2.32) cannot be 0(mod 11), because j + k, and x; # x, (if x =x, 
there is no error!), and we have seen in Exercise 2.36 that a key property of finite 
fields is that the product of any two non-zero elements is non-zero. Therefore, the 
check sum (2.30) is not a multiple of 11, so errors involving the interchange of any 
two unequal digits are always detected. 


EXERCISE 2.37 Verify that if the ISBN in Example 2.27 is incorrectly recorded as 0-19- 
895665-0 then the transposition error is detected. 


We now introduce a decimal code which, unlike the ISBN, will correct all 
single errors. This consists of all the codewords x,x, ... Xj Which satisfy the two 
check equations 


10 
S, =), =0(mod 11) (2.33) 


i=1 


10 
5, =), ix,=0(mod 11) (2.34) 


i=1 
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where the x; take the values 0, 1, 2, ..., 8, 9, there now being two check digits x, and 
Xo. Notice that S, is the same as § in (2.27), so the second check equation above is 
the same as the single check equation for the ISBN, but now we do not allow x, to 
take the value X (= 10). Suppose that a single error of magnitude e (#0) occurs in 
the jth position, so that the received word is 


Ka¥p.n00. Xin. 2) Fs Spence Mag 


From the definition (2.33) the first check sum for this word is 


10 
5, =) 4 +e =e(mod 11) 


i=1 


showing that the magnitude of the error is equal to S,(mod 11). Similarly, from 
(2.34) the second check sum is 


10 
S, =>) ix, +je=je(mod 11) 


i=1 


Bearing in mind throughout that we are working with the finite field GF(11), we 
can write 


j=S,e"=S,8; 


which gives the position of the error (reading from left to right in the received 
word). 


@ EXAMPLE 2.30 


Suppose that a received word is 1764753052. To obtain S, in (2.33), sum the 
digits to get S,=40=7, which is the magnitude e of the error. The weighted 
sum S,= x, +2x,+-+-++10X in (2.34) is easily found to be S,=200=2. Hence 
the position of the error is j=2.7-'=2.8=16=5 (you were asked to find 
multiplicative inverses (mod 11) in Exercise 2.35; here 8.7=56=1 so 7~'=8). 
The correct codeword is therefore 1764053052 (notice that the corrected fifth 
digit is obtained by subtracting the error e from the incorrect received fifth 
digit). 


In general, the decoding scheme is as follows: 


(i) ‘If S,=0, S,=0 then assume there is no error and the received word is a 
codeword. 

(ii) If S,#0, S, #0 then we assume that a single error has occurred in the 
digit with position 5,S~', and this digit is corrected by subtracting S, from 
it. 

(iii) If S,=0 or S,=0 (not both) then we have detected at least two errors. 
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It is interesting to realize that the above 10 digit code could be used as a 
telephone number code. There are over 82 million numbers which satisfy the two 
check equations (2.33) and (2.34), enough for all the telephones in the UK. The 
telephone exchange could use the decoding scheme described, so that if you keyed in 
one wrong digit you would still have your call placed correctly; and (see Exercise 
2.39 below) if you inadvertently interchanged two digits (easily done!) then instead 
of getting a ‘wrong number’ you would get an error signal (‘number unobtainable’ 
tone). Just think how much frustration would be saved if a code like this was 
adopted countrywide! 


EXERCISE 2.38 Decode the received words 0206211909 and 061293587 using the above 
code. 


EXERCISE 2.39 Show that the above code detects all errors where two (different) digits of 
a codeword have been interchanged. 


EXERCISE 2.40 Show that the decimal code consisting of all words x,x,... x,, with 
x, €{0,1,2,..., 9}, satisfying the check equations 


10 10 

> x; =0(mod 10), 7 ix; = 0(mod 10) 

i= i=1 
is not a single-error-correcting code. (Hint: find two codewords which are distance 2 
apart.) 


We conclude this chapter with a decimal code of length 10, defined as before 
over GF(11), but which corrects all double errors. We begin with the s.e.c. decimal 
code just described, and select those codewords which satisfy in addition to (2.33) 
and (2.34) the extra two check equations 

10 10 
S;=f=0, S,=> fx 50 


i=l i=l 


Suppose that two errors of magnitudes e, and e, occur in positions i and j 
respectively. Using the same argument as for the s.e.c. case we obtain 


10 
5, = >) Gj +e; +e) =e, +e (1) 
i=l 
S, = ie, + je, (2) 
S, =i, +/’e, (3) 
S,= Pe, +j°e, (4) 


where all arithmetic is modulo 11. We therefore have the four equations (1) to (4) to 
solve for the four unknowns i, j, e,, e). Although these equations are non-linear in 7 
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and j, they can be put into a form which is much easier to solve. The trick is to 
eliminate e,, e, and i so as to obtain a quadratic equation in j. Carry out the 
operations indicated below, for example, the first one is to subtract equation (2) 
from j times equation (1). 


Jx()-Q): jS:-S.= (Dey (5) 
Fx (2)-(): JS. —S3= F— iie, (6) 
jx Q)-(@): j8;-S,=(U-D?e, () 


(6)? = (5) x (7): (fS_— $3)? = (JS, — Sp) (JS3 — Sg) 
Rearranging the last equation gives 
P(S3-S,S3) + j(S,S4- 8,53) + (S3 — S,S,) =0 (2.35) 


EXERCISE 2.41 Verify that by eliminating e,, e, and j from the equations (1) to (4) you 
obtain a quadratic equation in i with the same coefficients as those in (2.35), 


In view of the above exercise, we conclude that the locations of the errors are 
the two roots of the quadratic equation 


ax’ + bx+c=0 (2.36) 
where 
a=S2—S,8s,, B=S,S,—SiSs, c=S?—5,5y (2.37) 


Once i and j have been found, it is very easy to determine e, and e, from the two 
linear equations (1) and (2). Notice that if just one error occurs (say e, #0, e, =0) 
then 


S:=e,, S,=ie, S;=ie,, S,=ie, 


in which case substituting into (2.37) gives a=0, b=0, c=0. 
We can summarize the decoding scheme as follows: 


(i) If S,;=S,=5S,=S8,=0, then assume there is no error and the received 
word is a codeword. 

(ii) If S=(S,, S,, 83, S,) #0 and a=b=c=0 then we assume that a single 
error has occurred in digit position S,5;', which is corrected by subtract- 
ing S, from it (just as for the s.e.c. code). 

(iii) If S #0 and a +0, b + 0 then (2.36) has solutions 


2 
(jam ew ~4ac) (2.38) 


provided b?—4ac is a non-zero square in GF(11). We assume there are 
two errors in positions i,j. These digits are corrected by subtracting 
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respectively e, and e, from them, where 


iS; — Sy 
&=—, ¢=5,-& (2.39) 
Ure | 
(iv) If none of (i), (ii) or (iii) holds, then we have detected at least three 


errors. 


An explanation is necessary of what we mean by square roots in GF(11). For 
example, 


7’ =49=5(mod 11) 


so we can write V5 = 7. Construct the following table of squares of integers less than 
1s 


9 10 


aes 8 
94 1 


23 
1 5 
If we read from the bottom row to the top this gives all the available square roots. 
This can be written as follows: 


y 1 a4 9 
Nye. 5 2 4) 3 


Notice that in some cases the square root is not unique. For example, we see that 
4°=77=5, so that V5 =4 or 7. However, this does not affect the solutions of (2.36), 
because 4=—7(mod 11) and therefore +4=¥7 in (2.38), showing that the same 
pair of values i, j is obtained whichever square root is used. 

Since 2, 6, 7, 8 do not have square roots (mod 11) (they do not occur in the 
bottom row of the table of squares) then if b’—4ac in (2.38) takes one of these 
values we have detected more than two errors. 

Using techniques beyond the scope of this book it can be shown that the 
minimum distance for this code is 5, verifying by Theorem 2.1 that the code does 
indeed correct all double errors. 


405) 165 7 
5, 3 3 


@ EXAMPLE 2.31 


For the above code, decode the received word 1204000910. 
We compute, using modulo 11 arithmetic throughout: 


S,=))x=14244494+1=17=6 
S,= >) x= 1.14224 4.44+8,949.1=102=3 
Sy= >) ?x,=17.1427.2 + 47.4+B°.9 + 97.1 =730= 4 


S,= 5° 8x, = 19.142°.2 + 4°.4+8°.9 + 9°.1 =5610=0 
4 
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From (2.37) we have 
a= S?}-S,S,=9-24=-15=7 
b= S,S,- S,S,;=0-12=10 
c= S3-S,S,=16-0=5 


so that case (iii) of the decoding scheme applies. We therefore need to solve 
the quadratic equation (2.36), namely 


7x?+10x+5=0 


which has roots 


-_-10+/(100- 140) 


uz) 14 
_-10+V(- 40) 
3 
-10+V4 _ -10+2 
3 3 


=-8x3"', -12x3" 
These roots reduce as follows: 
-8x3-'=3x4=12=1 
-12x37'=10x4=40=7 
Hence the errors occur in digits = 1 and j=7. From (2.39) 


a= 18 3 =3(- 6)" =3,.5°'=3.9=27=5 


e,=6-5=1 


so the corrected first and seventh digits are respectively 1-1=0, 0-5=6, and 
the decoded word is 0204006910. 


EXERCISE 2.42 For the above decimal code, decode the received word 4003100711. 


The decimal code described above is an example of an important class of codes 
called BCH codes, which can be developed to correct more than two errors — for 
example, in long-distance telecommunications codes of length 255 with 24 check 
bits are used to correct three errors. Details of these codes belong in more advanced 
books. 


PROBLEMS 


21 The United States ‘Zip’ postcode was introduced in Example 2.6, with a particular 
case reproduced in Figure 2.4. The code is a number a, a, ... a; with nine decimal 
digits. The digits are represented in a machine-readable barcode form, according to 
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the following scheme, where 0 corresponds to a short bar and 1 to a long bar: 


Decimal digit Barcode 
1 00011 
2 00101 
3 00110 
4 01001 
s 01010 
6 01100 
7 10001 
8 10010 
9 10100 
0 11000 


Each digit is represented by a block containing two long and three short bars, for 
example 8 is ||||i. Notice that there are no other possible arrangements of two long 
and three short bars. 

Ignoring the long ‘spacer’ bars at each end of the code, we see in Figure 2.4 that 
there are precisely 50 bars, which represent a 10 digit codeword a, a, ... ay. The extra 
digit a. is the check digit defined by 


10 


2» 


i=1 


. = 0(mod 10) 


That is, the sum of the digits is a multiple of 10. 


(a) The Zip code for North Carolina State University is 27695-8205. Determine 
the check digit. 
(b) Determine the Zip code represented by the barcode 


TURMAUCURMUMUAMARANUOR AAO OU OURO OOOO MAIR MO 


(c) Show that if the machine makes a single error in reading a barcode (i.e. a 
short bar is read as a long bar, or vice versa) then this error is always 
detected. 

(d) Show that if there is a single error, then once the location of the block in 
which this occurs has been determined, the error can be corrected. 

Hence determine the correct Zip code if the following barcode 
contains a single error. 


TOD Tee 


(e) Explain why if any two errors occur in a particular block of five bars, then 
these can always be detected. Explain also why some, but not all, such 
double errors can be corrected. 


2.2. The United States Postal Service money-order identification number, introduced in 
Example 2.9, consists of 10 decimal digits and a check digit. The check digit is the 


122 


2.3 


24 


2.5 


2.6 


2.7 
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remainder modulo 9 of the 10 digit number, that is x,X, ... X9 =2,,(mod 9). 
(a) Prove that a correct identification number satisfies 
X, +X, + +++ + X49 =x,,(mod 9) 


(b) Deduce that errors in which a 9 is replaced by a 0, or vice versa (excluding 
the check digit), go undetected. 

(c) Show also that errors involving the transposition of two adjacent digits will 
only be detected if they involve the check digit, What happens if the two 
transposed digits are not adjacent? 


A codeword x,x,x;x,Xs consisting of five decimal digits is defined so that the check 
digit x, satisfies the condition 


X, + 3x, +723 + 9x4 + x5 =0(mod 10) 


(a) Encode the number 9347. 

(b) Show that all single errors will be detected. 

(c) Will all transpositions of adjacent digits be detected? 

(d) Show that the ‘weights’ 1, 3, 7, 9 multiplying x,, x,, x;, %4 respectively are 
the only possible integers less than 10 which can be used if all single errors 
are to be detected. 


Write down the check matrix for the perfect Hamming code with four check 
bits. Encode the information message 10110111101. If a received word is 
010101100011010, determine the information bits of the transmitted message. 


An International Standard Book Number (ISBN) is recorded as 0-19-853827-3. Show 
that it is incorrect. Determine the correct ISBN, assuming that the sixth digit (reading 
from left to right) is in error. 

It is later found that the first six digits were correctly recorded, but that two other 
adjacent digits in the six-digit number assigned by the publisher had been inadver- 
tently interchanged. Show that this error cannot be corrected, by determining two 
possible ISBNs which satisfy the given conditions. 


Since 1966 all Norwegian citizens have been allocated an identification number 
X,X_X3 ... X,, consisting of 11 decimal digits. The first six digits represent the date of 
birth (day, month, year), x,x,x, is a personal number and x, x,, are check digits 
defined by 


Xo = — (BX, + 7Xq + OX, + Xq + BXs + 9X +42, + 5Xy + 2X_)(mod 11) 
Xp, = — (SX, + 4.Xq + 3X, + WXy + 7X5 + 6X5 + 5Xq +4.Xg + 3X + 2x,9)(mod 11) 
Write down the check matrix for this code. Hence verify that the code will not detect 


double errors of the form x, + €, X,)+11 &, where ¢ can take any of the values 
0, 1,25 20,10: 


The double-error-correcting decimal code described in Section 2.6 can be extended to 
correct more than two errors. For example, to correct three errors we use six check 
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2.8 


2.9 


equations S, =0, 5, =0, ..., 5s =0 (mod 11), where 


Sar= 3 x, j=0,1,2,3,4,5 
ft 

The positions p,, p2, Pp; of the errors are then given by the three roots of the cubic 
equation 

x? + a,x" + a,x+a,=0 
where the a, are the solutions of the equations 

S4+a,S3; + a,S, + a,5,=0 

S;+a,S,+a,S;+a,S,=0 

So+a,S5+a,5S,+a,S,=0 
Since all arithmetic is performed modulo 11, the roots of the cubic equation are found 
simply by trying the values x=0, 1, 2, ..., 10. 


The corresponding magnitudes e,, e,, e; of the errors are then given by the 
solution of the set of equations 


eter lien [si 
Py Pr P3 |) &. |=) Sp 
2) 22 

Py Pz P3 || &3 S3 


Suppose that a word is received for which it is found that 
S,=2, S,=8, 5,;=4, S,=5, S,=3, S,=2 


Assuming that three transmission errors have occurred, find their positions and 
magnitudes, 


A simple way of determining the check digit x,) for an ISBN using the expression 
(2.28) is to apply the fact that the remainder modulo 11 of a three-digit decimal 
number abc is equal to (c— b+ a)(mod 11), For example, if the first nine digits of an 
ISBN are 039330711 (see Exercise 2.32) then from (2.28) we have 


9 
X49 =>, ix,(mod 11) 
i=1 
=126(mod 11) 
=(6 - 2+ 1)(mod 11) =5 
Prove the stated fact. Also prove that in general for any n-digit decimal number 
a, a, ... a, its remainder (mod 11) is equal to 


(@y, — Qy_y + Aq-2 — Any ++**)mod 11 
The Universal Product Code used in the United States to identify retail products 


consists of a number 2x,X,x3 ... X,X;. With 12 decimal digits. The first digit denotes 
the product type, the next five digits represent the manufacturer, and the next five 
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digits are assigned to the product by the manufacturer, The final digit x,, is the check 
digit defined by 


B(x, +3 + Xs +Xy + Xo +X yy) + Xp + Xy + Xp + Xs t X10 + X12 =0(mod 10) 


(a) Prove that the code detects all single errors. 
(b) Does the code detect all errors involving the transposition of two adjacent 
digits? 


2.10 The Article Number Association (ANA) System in the UK is a similar scheme to that 
in the previous problem. It consists of a number with 13 decimal digits, where the 
check digit x,; is defined by 


3B (xy + Xq + Xp +.Xy + Xi + Xiq) + Xp + Xs + Xs + .X7 + Xo + X11 + X13 =0(mod 10) 


and is based on the EAN system described in Example 2.5. For example, the 
Guardian newspaper on Mondays has the number 9770261307019. Here x,x,x; 
denotes the product type, x, retrieves the price from the memory of the shop’s 
computer, the next seven digits are the Guardian's code and x,, is the day of the 
week, starting with 1 for Monday, 2 for Tuesday and so on. What is the Guardian's 
number on Fridays? 


2.11 (a) A simple decimal code consists of words x,x, ... x,c, where the check digit c is 
defined by 


X, +X) +++ +x, =c(mod 10) 


Deduce that the code detects all single errors in the digits x,, but that no 
transpositions of adjacent x digits are detected. 
(b) An improvement is obtained by using the check equation 


X, — Xy + Xy— X4+Xs— +++ = c(mod 10) 


Show that this still detects all single errors in the x digits, and also detects any 
error in which x, and x,,, are interchanged except when x; — x;,, = +5. Hence deduce 
that 8/9 of all such transpositions are detected. 
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3.1 INTRODUCTION AND EXAMPLES 


Perhaps one of the basic human drives is the desire to have control over events. From 
ancient times people have appealed to their deity to ‘make things happen’ as they 
would like them to. As technology has developed, devices have been invented which 
perform a controlling function without the need for human supervision. One of the 
oldest of these mechanisms can be traced back to the Greek civilization of the first 
century AD: you will certainly be familiar with the domestic toilet where, after 
flushing, the water tank automatically fills up to its original level. Indeed, if you lift 
off the cistern lid you are likely to find a line inside marking the normal water level. 
The job of the water-supplying device is to maintain the water in the tank as close as 
possible to this fixed level. Identical mechanisms operate in many households for the 
main cold water cistern; and for the ‘header’ tank which ensures that the radiators of 
a hot water central heating system are kept full of water. The principle behind the 
water regulating system is simple but ingenious, and is illustrated in Figure 3.1. 

A floating ball monitors the water level; when this is below the desired level the 
supply valve is open and water flows into the cistern. The float rises with the water 
level, and when this reaches its preset position the supply valve shuts off. The valve is a 
purely mechanical device which is operated by the rigid arm attached to the float. This 
device for regulating the water level in a cistern exhibits several characteristics which 
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<— Water supply 
Ball float 


Figure 3.1 


are common to many control systems. First, the operation is automatic — that is, when 
working correctly it needs no human intervention; secondly, it uses the idea of 
feedback — which means that a knowledge of the actual water level is “fed back’ by 
means of the float and arm so as to control the supply valve; thirdly, it will still work 
well even if there are wide fluctuations in the pressure of the incoming water — for 
example, if the water pressure drops the tank will simply take longer to be filled up. 

You may have noticed the introduction of the expression control system: this is a 
term which is used in a wide sense to describe many situations. Some examples of 
control systems are: 


(i) An aircraft, where one of many control problems is to fly and land 
using the ‘autopilot’. 

(ii) A manufacturing process, where the objective is to control the cost and 
quality of the end product. 

(iii) The human body, where many functions are regulated automatically 
without our conscious intervention; for example, body temperature is 
kept so nearly constant that if it-rises by even 1 degree we suspect that 
we are ill; when playing tennis or other games we are able to keep our 
eyes on the ball even though we are moving around vigorously. 

(iv) A motor vehicle, which nowadays is full of control devices — for 
example, the fuel injection system which controls the fuel supply to the 
engine; an automatic gearbox; an antilock braking system which 
prevents the car skidding when the brakes are applied suddenly. 

(v) The economy of a country or region where it is required to control, for 
example, the level of unemployment and the rate of inflation. 


You should be aware that although the control systems mentioned above are all 
quite different from each other, they do have certain features in common. We can think 
of them as consisting of a lot of interconnected parts whose interactions we may or 
may not fully understand. Our aim is to make the system behave in some desired 
fashion by suitably controlling the inputs (or control variables) so as to produce 
satisfactory outputs. For example, in the plumbing system of Figure 3.1, the output is 
the water level, and the input is the water supply to the cistern. Don’t imagine, 
however, that it’s always necessary to have an accurate mathematical model of a 
system before it can be controlled. After all, most people manage to learn the tricky 
balancing act of riding a bicycle without ever thinking of the mathematics involved! 
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Inputs Outputs 


Controlled 
system 


Controlling 
device 


Output 
monitoring 


= 
Feedback 


Figure 3.2 


The idea of feedback is a crucial one. To take another homely example: in a 
central heating system the room thermostat is set to a comfortable level (say 20 °C). 
The thermostat monitors the actual temperature and ‘feeds back’ this information to 
the heat supply, turning it on or off according to whether the room temperature is 
above or below the desired setting. The input here is the heat supplied by the boiler 
or other heat source, and the output is the room temperature. 

We can represent a typical situation as in Figure 3.2. The outputs are measured 
and this information is fed back to the controlling device which modifies the inputs 
(if necessary) so as to produce the desired behaviour. For example, an automatic 
landing system for an aircraft will be designed to ensure that at the instant when the 
wheels touch the runway, the aircraft has zero velocity in the vertical direction, so 
that it touches down without a bump. In order to achieve this many items such as 
altitude, airspeed, position of flaps and rudder will have to be monitored, and this 
information is fed back to the autopilot which is flying the aircraft. Because of the 
appearance of the diagram in Figure 3.2 the complete set-up is called a closed loop 
system, since the information from the output flows back around the ‘loop’. In real 
life there will often be unpredictable external disturbances which affect the system 
under consideration — for example, sudden gusts of wind can occur as an aircraft 
approaches the runway, or the temperature outside a centrally heated building may 
suddenly drop. A well-designed control system will be able to cope with such 
unexpected influences. 

During the nineteenth century many feedback devices were invented, the most 
famous and widely used being James Watt’s governor for regulating the speed of 
steam engines. An early attempt to investigate some mathematical problems of 
control was made by Airy in 1840 when he was Astronomer Royal. He was 
interested in keeping a telescope pointed at a fixed point in the sky even though the 
Earth is rotating. The book by Mayr (1970) gives a good account of the origins of 
feedback control. 

Let’s now look at some examples of mathematical models of control systems. 


@ EXAMPLE 3.1 


Let’s return to the bank savings model introduced in Example 1.1 in Chapter 1. 
Recall that x(k) denotes the amount in the account at the end of the kth time 


Introduction and Examples ; 129 


period, and that at the end of each period an amount of interest is added at the 
rate of r/100n on the balance at the beginning of the period. A net amount of 
u(k) is deposited in the account during the kth period, but this does not earn 
interest until the next time period (a negative u(k) indicates a net withdrawal). 
The equation describing the behaviour of the account was found to be 


r 
xk =(1+ sion )k) + uk 1), k= 0, 1,2). (3.1) 
Here the variable x(k) denotes the state of the account at the end of the kth 
period, and in this example the output is simply equal to x(k). The control 
variable u(k) is used to determine a desired pattern of savings; that is, we 
exercise control over the sequence of outputs x(1), x(2), x(3), ... by selecting a 
suitable sequence of inputs u(1), u(2), u(3), .... 
A more general version of (3.1) is 


x(k+1)= ax(k)+ Bulk), k=0,1,2,... (3.2) 


where a and # are constants. We have relabelled the sequence of inputs as 
u(0), u(1), u(2), ... in (3.2) instead of u(1), u(2), u(3), --- in (3.1), so that we have 
u(k) instead of u(k+1) on the right-hand side of the equation. This is purely a 
convention, but one which is widely used. 


EXERCISE 3.1 Consider the savings account described in Exercise 1.2 in Chapter 1, You 
were asked to show that, with the given sequence of inputs, the balance in the account 
at the end of 2 years would be £2493.96. Find how much you should deposit into the 
account so that after another 6 months have elapsed the account holds exactly £3000. 


An important area of control applicatidns involves dynamical systems which 
obey Newton's law of motion. This says that a body of mass m upon which a force u 
is exerted experiences an acceleration f where 


mf=u (3.3) 


™@ EXAMPLE 3.2 


Suppose a car having mass m is being driven along a straight, level road. For 
simplicity assume that the car is controlled only by the throttle, producing an 
accelerating force u, on the car, and by the brake which produces a retarding 
force u, upon the car. Both of these forces will vary with time tin a continuous 
fashion, so we write u,(t), U2(t) to express the fact that u, and u, are functions 
of time. Suppose that we are only interested in the car's distance x, from some 
starting point, and its velocity x,. Again, these quantities x,(t) and x,(t) will 
depend upon the time t which has elapsed since starting off. The acceleration f 
will be the derivative of the velocity, that is 


dx, 
dt 
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so Newton’s equation (3.3) becomes 
dx, 
m—e= 
dt 
Notice that the total forces on the car are u, — u,; the force u, carries a negative 


sign since it opposes the motion — that is, it has the effect of reducing the 
velocity. To complete the picture we need to add the fact that 


U,— U2 (3.4) 


eae (3.5) 


which states that the velocity at any instant is the derivative of the distance 
travelled. The two equations (3.4) and (3.5) completely describe the motion of 
the car. In practice there will be limits on the sizes of the forces u, and u, for 
obvious practical reasons. At this stage it's worth rewriting (3.4) and (3.5) in the 


form 

dx/dt}_|0 1|/ x 0 oO ju 

fee -5 ae *\aim -11m u, (3:8) 
To construct (3.6) from its constituent parts, we have used the rule given in 
equations (1.59) and (1.60) in Chapter 1 for multiplying together a matrix and a 
vector. A compact matrix notation for (3.6) is 


s = Ax+ Bu (3.7) 


where in this example 


ele S10) 1 all 28 0 | 

-[* } a-|9 ab B=[ 1m sai u-| 2] 

Notice that the derivative dx/dt of the vector x consists of the vector with 
components dx,/dt, dx,/dt. In (3.7) we call x the state vector, and u the control 
vector, and their components the state variables and control variables, 
respectively. The state variables are so called because they tell us what the state 
of the system is. 


Control problems could be as follows: ° 


(i) Starting from rest at some initial point, that is x,(0)=0, x,(0)=0, find 
suitable functions u,(t), u,(¢) so as to reach some given point x,(7)=a 
with zero final velocity x,(7) =0 in the least possible time T — perhaps a 
race from one set of green traffic lights to the next set at red! 

(ii) Alternatively, the objective might be to reach x, (7) =a whilst consuming 
the least possible amount of fuel. To achieve this we would need to know 
how fuel consumption depends upon the velocity and acceleration, but 
clearly the optimum strategy would be quite different from that in (i), 
where instinct tells us that full throttle will be needed at least some of the 
time — but this will bring penalties in fuel consumption. 
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We'll be considering so-called optimal control problems like those in (i) and (ii), 
where the objective is to do something in the ‘best possible’ way, in Chapter 4. 

To make the mathematical description of the car’s motion more realistic we 
could take into account other factors such as road friction, wind resistance, engine 
speed, and so on. We could imagine the car to be travelling in one lane of a 
motorway, where an objective might be to make sure we keep within a reasonable 
distance from the vehicle in front. Indeed, the day might not be too far away when 
a metal strip embedded in the roadway will be used to control vehicles on 
motorways. 


EXERCISE 3.2 List as many factors as you can think of which actually affect the motion 
of a car travelling along a road. Be imaginative: don’t ignore factors which have only 
a very slight effect — they are all present in reality! 


@ EXAMPLE 3.3 


Mechanical systems involving springs are favourite models which can be used 
to illustrate ideas of control. Let’s see what principles are involved. Consider 
first a carriage of mass m which runs along smooth, straight, horizontal rails, 
and is connected by a spring to a fixed vertical support as shown in Figure 3.3. 


F 


WG 


We 


1 
t¢_— 1+ x ——_» 


Figure 3.3 


When the spring is neither extended nor compressed its length is /, it exerts no 
force and consequently there is no motion - the system is in equilibrium. 
Suppose the mass is pulled a distance x to the right, as shown in Figure 3.3. 
The spring is assumed to obey Hooke’s /aw, which states that the force F it 
exerts on the mass is kx, where kis a constant for a given spring. Newton's law 
(3.3) tells us that the equation describing the motion of the mass when it is let 
gois 


2 
ek (3.8) 
since the acceleration is f=d’x/dt?. Notice that the force F exerted by the 


spring on the mass is in the opposite direction to that in which x is increasing, 
which accounts for the negative sign in (3.8). In other words, the spring pushes 
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back if compressed, and pulls back if stretched. If we write k/m= w? then (3.8) 
becomes 


2 
eS =-w’x (3.9) 


and you can easily verify by differentiating 
x(t)=a cos wt+f sin wt (3.10) 


twice that (3.10) is the solution of (3.9), where a and # are constants. The 
motion of the mass m described by (3.9) is oscillatory — that is, it moves 
backwards and forwards, with period 22/w. This means that the displacement 
of the mass from its equilibrium position at time t is the same at times 
t+ 2nz/w, for n=1,2,3,.... This behaviour is called simple harmonic motion. 


EXERCISE 3.3 Show that for x(t) in (3.10) 


ct + den) = x(t) 


w 


for all positive integers n. 


Figure 3.4 


Suppose now that a force u is applied to the carriage, as shown in Figure 3.4. 
Adding this to (3.8) produces the new equation of motion 


2. 
aoe (3.11) 
t 


As in Example 3.2, let’s now denote the distance x from equilibrium by x,, and let 
the velocity dx/dt of the carriage at time t be x, as in (3.5). The single second-order 
equation (3.11) (so called because it contains a second-order derivative) can be 
converted into two first-order equations, just as we did in the previous example. We 
can write 


dx. 4 (ar). 


dt? dt \dt dt 
so that (3.11) becomes 

dx. 

ati swig pt 
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Combining this equation with (3.5) gives us 


d}% On | |a% 0 
a[el-[s> olla |+Lim ee 
A B 


Again, this is in the form (3.7), where now there is just a single control variable u. 

Let’s now suppose a second carriage having mass mm, is connected by another 
spring to the first carriage (now labelled m,) as shown in Figure 3.5. An external 
force u is applied to the right-hand end. 

The two springs have constants k, and k, respectively, and the forces they exert 
are denoted by F, and F,. Let the displacements of the two carriages from 
equilibrium be x, and x,. Suppose the first spring is extended by an amount x,, so the 
force it exerts is F; = k,x,. The net extension of the second spring is x, —x,, since its 
right-hand end moves a distance x, and its left-hand end a distance x, (if x, <x, this 
means the second spring has a net compression, so the forces F, act in opposite 
directions to those shown in Figure 3.5). Newton’s law of motion (3.3) now 
produces 


ay oat 
mh ae eres 
= ky (X_ — X41) — kyxXy (3.13) 
for the first mass, and 
ax, 
er a -F, +u 
dt 


= ky (xq -—X,) tu (3.14) 


for the second mass. Notice that the accelerations d’x,/dt? and d?x,/dt? carry the 
same signs as x, and x, respectively; forces which act in the opposing direction carry 
negative signs. Since there are now two second-order equations (3.13) and (3.14) we 
need to introduce two additional variables, which are the velocities 


x3=—, %=— (3.15) 


000000 000000 
zs =a 


Figure 3.5 
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of the two carriages. Substituting into (3.13) gives us 


dx, 
Ts —(k, + ky)xy + yxy (3.16) 
and from (3.14) we obtain 
dx, 
M, erie kx, — kyXq + u (3.17) 


You should verify that the equations (3.15), (3.16) and (3.17) can be written in the 
combined form 


% 0 0 1 Olfy 0 
a|% 0 0 0 1\|x| | 0 
< |?) = + 3.18 
dt |x] |—(ey +h, kyl, 0 0) x, a | (ere) 
X% ky/my -k,/m, 0 0}) x, 1/m, 
A i 


which again gives us (3.7) with an appropriate interpretation of x, A, B and u. 
Indeed, this expression (3.7) is our general model of a linear control system when 


time is measured continuously. In general if there are n state variables x,,..., x, and 
m control variables u,, u,,..., u, then A is a square nxn matrix and B is an nx m 
matrix. 


EXERCISE 3.4 Consider the mechanical system shown in Figure 3.6. This consists of 
two masses m, and m, which hang vertically from a fixed support to which they are 


Figure 3.6 
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connected by three springs as shown. The forces exerted by the springs are F,, F, 
and F; when the downwards displacements of the masses from the equilibrium 
position are x, and x,. The spring constants have the values shown in the figure. If 
m, =1, m,=4, write down Newton’s equation of motion for each of the two masses 
(you do not have to take gravitational forces into account). If the velocities are 
dx, /dt=x;, dx,/dt=.x, show that the equations can be written in the form (3.7) as 


follows: 
=) 0 0 1 O][x,] [0 
0 0 01 0 
Colle) ae i) 
a| x5 |" |-6 30 off, |*}o|" (3-19) 
x| | 2 -G+H4 0 Olfx,| [4 


@ EXAMPLE 3.4 


In the mechanical system shown in Figure 3.3 we ignored all frictional forces so 
that when the mass is set moving it performs oscillatory motions which 
continue indefinitely. In practice there will be resisting or damping forces which 
oppose the motion. For example, if a car was supported only on springs then 
after hitting a bump in the road it would bounce up and down in a highly 
uncomfortable way. For this reason the suspension system contains damping 
elements called ‘shock absorbers’ which cause the oscillatory motion induced 
by going over a bump to die away rapidly. Of course, in practice the elasticity of 
the tyres must also be taken in account, 


R 
Y =e : 


Figure 3,7 


In Figure 3.7 the damper is shown as exerting a force R opposing the 
motion (the diagrammatic representation is of a piston in a cylinder). It is a 
reasonable approximation to reality to assume that the damping force is 
proportional to the relative velocity, so that here R= pdx/dt, where p (>0) is 
the damping constant. Ignore for the present the force u shown in Figure 3.7. 
The equation of motion (3.8) now becomes 


2. 
eee fobesoa§ (3.20) 
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EXERCISE 3.5 Consider equation (3.20) with m=1, p=4 and k=5. Verify that 
x(t)=e"(a cos t+f sin ft) (3.21) 


where a and f are arbitrary constants, is the general solution of the equation by 
finding dx/dt and d?x/d¢* and substituting into (3.20). 


In the preceding exercise it follows from (3.21) that x(t) + 0 as t 9, since the 
exponential term certainly has this property, and the term a cos t+ sin ft can 
never exceed |a|+ |] because |cos t|/<1, |sin t|<1 (|a| is the modulus of a, 
defined in Section 1.2, Chapter 1). In fact it can be shown in general that the solution 
of (3.20) always tends to zero as t becomes larger and larger provided k and p are 
both positive (see Problem 3.1). In other words, for a damped system like that in 
Figure 3.7, oscillations always die away and the system retums to rest after a 
sufficiently long time; the larger the value of p, the more rapidly do oscillations 
decay. 


EXERCISE 3.6 Suppose that a force u(t) is applied to the carriage as shown in Figure 3,7, 
so that a term u is added to the right-hand side in equation (3.20). Using the numerical 
values of m, p and k in Exercise 3.5, by taking x= x, and dx/dt= x, show that (3.20) 
can be written in the form (3.7) with 


les ali Oindial mite) 
[I a=| 4} a=t| (3.22) 


@ EXAMPLE 3.5 


Let's have another look at the simple model of a national economy introduced 
in Exercise 1.10 and considered further in Problem 1.18. The national income |, 
at year k was found to satisfy the equation 


Ieaz- (14 Bh + OB = Geez, k=0,1,2,... (3.23) 


where a and f are constants and G, is the government expenditure in year k. 
Here the aim is to determine how government expenditure should be planned 
so as to produce a desirable pattern of national income. Indeed, finance 
ministers around the world would dearly like to know how this could be 
achieved — the ‘science’ of mathematical economics is still in its infancy. For 
example, it was found in Problem 1.18 that (under the assumptions of the 
model as set out in Exercise 1.10) with government expenditure kept constant 
the national income behaves in an oscillatory fashion, but eventually settles 
down to twice government expenditure — a result which would not be predicted 
by ‘common sense’ arguments. 


The equaticn (3.23) has been reintroduced to show you that models of control 
systems can be either in the form of differential equations like those in Examples 
3.2, 3.3 and 3.4 or in the form of difference equations like (3.1) or (3.23). The 
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general matrix description in the difference equation case, as compared with (3.7) for 
the differential equation case, is 


x(k+1)=Ax(k)+Bu(k), k=0,1,2,... (3.24) 
This is just the matrix model (1.58) with the control term Bu(k) added on. 


EXERCISE 3.7 By setting x,(k)=J,, x,(k) =/,,, show that (3.23) can be expressed in the 


form (3.24) with 
_|a® = [a0 1 _[o z 
=) a=|o, aes a=(?) ul) = Grea 


™@ EXAMPLE 3.6 


If you take a dose of medicine it first enters your gastrointestinal tract. From 
there it is distributed throughout your bloodstream to be metabolized and 
eventually eliminated. At a particular instant of time t, let x, be the mass of 
drug in the gastrointestinal tract, let x, be the mass of drug in the bloodstream, 
and let u be the rate at which the drug is taken (these variables are all functions 
of t). Then the process is described by 


1 
zx 
x 

+ 
c 


rate of 


drug 
pe ingestion 
bloodstream 
and 
dx, 
= kx - k% 
dt 
rate of rate of rate of 
increase of receipt of drug 
drug in drug into ‘excretion 
bloodstream bloodstream 


where k, and k, are positive constants depending upon characteristics of the 
body. If these equations are written in the form (3.7) then you should verify 


that 
afi A} oe 


The objective is to determine the rate u at which the drug should be given to a 
patient so as to control the amount of the drug in the bloodstream according to 
medically desirable levels. 
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EXERCISE 3.8 Suppose in the situation described in Example 3.6 there are initial amounts 
of x,(0)=a and x,(0)=b of drug present. If no further doses of drug are adminis- 
tered, verify that the equations in Example 3.6 are satisfied by 

~kt 
X(t) = ae 


ka ~kyt kt 
1 e 7 -e ') 


mkot 
= be 
x(t) + Ao 


assuming k, # k,. This shows that x, (t) 0, x,(t) 30 as t3, 


@ EXAMPLE 3.7 


An oven has a rectangular cross-section as shown in Figure 3.8. 


Heating 
element 


Figure 3.8 


The objective is to control the temperature 7, of the interior of the oven by 
varying the heat input u to the jacket, which is insulated. The energy is supplied 
by means of an electric heating element. Let 7; and T, denote the temperatures 
of the jacket and external surroundings respectively. The oven interior gains 
heat solely by radiation from the inside surface of the jacket. The rate at which 
this energy is radiated is proportional to the temperature difference T,- 7, and 
also depends upon the interior surface area a, of the jacket. The rate of increase 
of heat energy of the oven interior is therefore given by 


dj 
c ae aT - 7) (3.25) 
where ¢, is the heat capacity of the oven interior, and r, is called the radiation 
coefficient of the surface. 
In a similar way the jacket also radiates heat to the external surroundings, 
so the ‘heat balance’ equation for the jacket is 


-an(F-T) - arit-T) + u (3.26) 
rate of heat rate of heat heat 
loss to oven Joss to external input 


interior surroundings 
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where ¢, is the heat capacity of the jacket, a, is the area of the external surface 
of the jacket and r, is its radiation coefficient. It is assumed that T, is constant. 
The heat input u to the jacket is varied by altering the current through the coil. 
As this input is not applied directly to the oven interior, a first question to ask is 
whether it is indeed possible to maintain the oven interior temperature T, at any 
desired level by appropriately altering u. This is an illustration of what is meant 
by ‘controllability’ of a system, and will be investigated in Section 3.2 (in 
particular, see Example 3.10). A second question is whether the value of the 
oven interior temperature 7, can be determined even if it cannot be measured 
directly - perhaps we can only measure the jacket temperature T;. This is an 
example of the problem of ‘observability’ of a system, and will be studied in 
Section 3.3 (in particular, see Example 3.13). We shall also find that although 
the two questions seem to involve quite different aspects of physical reality, 
they are actually closely related in mathematical terms. 


EXERCISE 3.9 When a deep-sea diver is brought up to the surface this is done by 
attaching the diver to a cable which is operated by a winch. Assuming that the motion 
takes place entirely vertically, then at a depth A below the surface the equation of 
motion is obtained from Newton’s law as 

m ee ah _¢ 
dr? dt 
where m is the mass of the diver, v is the volume of the diver, p is the density of 
water, u is a positive drag coefficient, f is the force exerted by the cable and g is the 
gravitational constant. Let p denote the internal body pressure of the diver relative to 
atmospheric pressure at sea level. It is important for the health of the diver to avoid 
large changes in the body pressure p whilst being raised to the surface by the cable. It 
can be shown that 


aneran 
at K(ph - p) 
where k is a positive constant characterizing the body tissue. Define as state variables 


ak = dh = 
% =A, 2 ae? %3 =P 


and let the control variable be 
y= ™eapu-f 
m 
Write the equations in the matrix form (3.7) where x is the vector with components 


Xs Xp» Xs. 


EXERCISE 3.10 An overhead crane of mass M moves along a horizontal track, and its 
distance at time t from a fixed reference point is s. A grab of mass m is attached to 
the crane by a rod whose mass can be neglected, as shown in Figure 3.9. The angle 0 
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m 


Figure 3.9 


made by this rod to the vertical is assumed small, in which case the equations of 
motion tum out to be 


2, 
M £2 +(m+Mg0-+u=0 
dt 
2 
M £3 —mgo-u=0 
dt 


where u is a control force. Taking M=1, m=0.1 write these equations in the matrix 
form (3.7) by taking as state variables 
de ds 


x, =8, re ane x3=5, X= 


EXERCISE 3.11 Return to the model of the buffalo population in the American west 


described in Problem 1.28. It was found that 
Fy42 =0.95F,,, +0.12F, (3.27) 
My. =0.95M,,,+0.14F,, k=0,1,2,... (3,28) 


where F, and M, are the numbers of female and male buffalo at the start of year k, 
where k=0 corresponds to 1830. You were asked to show that under natural 
conditions the numbers of animals would continue to grow indefinitely. In fact owing 
to indiscriminate slaughter by white settlers who were only interested in buffalo hides, 
the number of animals was reduced from an estimated 60 million in 1830 to just a few 
hundred only 60 years later. Suppose that a policy of strictly controlled slaughter had 
been adopted, whereby a number of adult females were killed for food each year. 
This is equivalent to an extra control term —u(k) on the right-hand side of equation 
(3.27). Define the state variables 


XD, OH=Fin, BID=M, “(= Minr 
and hence write (3.27) and (3.28) in the form (3,24). 
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3.2 CONTROLLABILITY 


Politicians the world over would desperately like to be able to control the economy 
of their country in ways they think desirable — especially ways which appeal to 
voters! For example, to keep down the rate of inflation, to reduce unemployment and 
to raise living standards are all admirable but somehow elusive targets. Fashionable 
ideas change: one party claims that controlling the supply of money is the answer, 
another relies on manipulating the interest rate, whilst a third calls for more 
government investment. There is a tendency for politicians to use jargon which 
confuses the electorate, such as ‘the unemployment rate is a lagging indicator’, but 
this is really a smokescreen designed to obscure the real lack of understanding of 
how to control an economy. In fact the first question to be asked is: can the results 
we wish to obtain be actually achieved by altering the factors we have selected? For 
example, can inflation be controlled by altering interest rates? Can unemployment be 
brought down by reducing taxes? Can living standards be raised by increasing 
investment? All these are questions of controllability; can a system be compelled to 
behave in a certain way by altering the variables which we have selected as the 
inputs? If the answer is ‘yes’ then we can go about devising suitable control schemes 
which will do the job; but if the answer is ‘no’ then any hope of constructing a 
successful strategy for control is doomed to failure. In this latter case we must alter 
the way in which control is applied, perhaps by selecting a different set of control 
variables, so as to make sure we get a set-up which is controllable. 
To investigate controllability it’s easier to begin with difference equations. 


M@ EXAMPLE 3.8 


Let's look at a discrete time model in the form (3.24), namely 
x(k+1)= Ax(k)+ Bulk), k=0,1,2,... (3.29) 


and suppose for simplicity there are two state variables x,(k) and x,(k), and a 
single control variable. This means that in (3.29) A is a 2x2 matrix and Bis a 
2x1 column vector which we will denote by b to emphasize that it is a vector. 
Suppose our objective is to drive the system from a given initial state x(0) to a 
given final state x, in two units of time, that is we want to make x(2)= x. 
Setting k=0 in (3.29) gives us 


x(1) = Ax(0) + bu(0) 
and setting k=1 in. (3.29) gives 
x(2) = Ax(1) + bu(1) 
= A[Ax(0) + bu(0)] + bu(1) 
= A’x(0) + Abu(0) + bu(1) (3.30) 


Our control problem is therefore to determine the values of the control u(0) 
and u(1) such that the expression in (3.30) is equal to any given final state x;,. 
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Let's write this out as 


u(0) 


You should notice carefully how we were able to convert the terms 
Abu(0)+ bu(1) in (3.30) into the expression on the right in (3.31). It's worth 
writing it out in full detail just this once: supposing that 


xy=A®x(0) + [b, Abi Ut) (3.31) 


then the term in (3.31) is 
u(1)]_| 4s jf u(1) 
ro,abl ot) =| ral 


nee 


b,u(1) + b,u(0) 


_| 5 bs 
-|f em + evo 


=bu(1) + Abu(0) 


which agrees with the original expression in (3.30). 
We can arrange (3.31) into the form 


er eee 


and then invert the matrix U=[b, Ab] to obtain 


u(1)|_ py-y — a2 
[tal ]=4 [y— A?x(0)] (3.32) 


provided that the matrix U-' exists. If this is the case then whatever the final 
state x, we can use (3.32) to compute control values u(0) and u(1) which will 
indeed steer the system from any initial state x(0) to x(2) = x,. 


We have already encountered the concept of the inverse matrix U~ in Section 
1.4, Chapter 1. Recall that 
UU =U Ual 


where / is the unit matrix having ones along the principal diagonal. We gave the 
formula for the inverse of a 2 x 2 matrix in (1.80), repeated here for convenience: 


eb dali 1 d -b 

——— 3:35) 
B 4 ad — bc & a G22) 
provided the determinant ad — bc + 0. This latter condition is required for a matrix to 
have an inverse, in which case the matrix is called non-singular (otherwise it is 


Controllability 143 


singular), We can therefore say that the system (3.29) is controllable provided the 
2x2 controllability matrix U =[b, Ab] is non-singular, that is 


det U=ad- bc +0 (3.34) 
EXERCISE 3.12 Consider a system described by the difference equation 
xeene| ¢ thew, =0)1;2, 1 (3.35) 
(a) If 
of] 
show that 


=T 
(b) If acontrol term 


[owe 


is added onto the right-hand side in (3.35), determine the controllability matrix 
U, and show that the system is controllable. 

(c) Use (3.32) to find the values of u(0) and u(1) which send the system from the 
initial state x(0) in part (a) to the final state 


orf 


EXERCISE 3.13 Consider the system described by 


xkee| | Zhen | ue. k=0,1,2,.. 


or 


Find for what values of the parameter a the system is not controllable. 


Let’s return to (3.29) and suppose that there are now three state variables and a 
single control variable, so we have 

x(k+1)=Ax(k)+ bu(k), k=0,1,2,3,... (3.36) 
where A is a 3X3 matrix and b is a3 x1 column vector. As before, set k=0, 1 in 
(3.36) to get the expression for x(2) in (3.30), but we now go one step further and 
take k=2 which gives from (3.36) 


x(3) = Ax(2) + bu(2) 
= A°x(0) + A?bu(0) + Abu(1) + bu(2), using (3.30) 


3 2 uD) 
= A*x(0) + [b, Ab, A7b]] u(1) (3.37) 
u(0) 


144 Making Things Happen 


Using the same argument as for the 2 x 2 case, it follows that we can obtain a control 
sequence u(0), u(1), u(2) which sends the system to any final state x(3)=x; 
provided U~' exists. Hence for the system (3.36) to be controllable we must have 
det U +0 exactly as in (3.34), but now U is the 3 x3 controllability matrix 


U=[b, Ab, Ab) (3.38) 


Notice that to compute the third column A’b in (3.38) we do not need A’. First work 
out Ab, and then use 


A*b=A(Ab) 


A formula for evaluating a 3 x3 determinant in terms of three 2 x2 determinants 
was given in Chapter 1, equations (1.86) and (1.87). 


@ EXAMPLE 3.9 


Let's investigate whether the system described by 


1 tee 1 

x(k+1)=| 3 0  4/x(k)+|Olulk), k=0,1,2,... (3.39) 
-—5 6 -2 1 

A b 


is controllable. Applying the usual rule (set out in (1.60)) for multiplying 
together a matrix and a vector we get 


1x1-1x0+2x1 3 
Ab=| 3x1+0x0+4x1/=| 7 
-5x1+6x0-2x1 -7 
and similarly 
A’b = A(Ab) 
3 1x3-1x7-2x7 -18 
=A) 7|=| 3x3+0x7-4x7|=/-19 
-7 -5x3+6x7+2x7 41 


The controllability matrix in (3.38) is therefore obtained by writing the three 
columns, b, Ab, A’b side by side to give 
1 3 -18 
U=|0 7 -19 
1-7 41 


and we need to evaluate the determinant of U using (1.86). This gives 


ail z =198)ealo) 19 |eyglon ez 
det U=1| 7 + af ml rel 7 


=(7 x 41-7 x 19) —3(1 x 19) - 18(-1 x 7) 
=154-57+ 126 =223 
which is non-zero, showing that (3.39) is controllable. 
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EXERCISE 3.14 Test the system (3.36) for controllability when 


EXERCISE 3.15 Use (3.37) to show that for a controllable system (3.36) the control 
sequence u(0), u(1), u(2) which sends the system from x(0) to any given state x, 
after three units of time is 


u(2) 


u(1) 
u(0) 


=U" [x,- A*x(0)] (3.40) 


where U™ is the inverse of the controllability matrix (3.38). 


The preceding exercise shows that for a controllable system with three state 
variables we can get to any final state in three units of time; previously we saw in 
(3.32) that with fwo state variables, nvo units of time are required. It therefore 
comes as no surprise that when there are n state variables and a single control 
variable, so in (3.36) A is nx nand bis nx 1, the following result holds: 

Provided the nx n controllability matrix defined by 


U=[b, Ab, Ab, A*b, ..., A"-'b] (3.41) 


is non-singular (i.e. det U#0), then a control sequence u(0), u(1), u(2),..., 
u(n—1) can be found which drives the system (3.36) from any initial state to any 
final state in n units of time. 

Notice that, as was pointed out for n=3, it is not necessary to work out powers 
of A in order to compute the columns of U in (3.41). Each column is obtained by 
multiplying the preceding one by A as follows: 


A*b=A(Ab), A®b=A(A*b), A*b=A(A*D),  ... (3.42) 


Of course we haven’t yet covered the evaluation of determinants when n>3. We'll 
defer this until Section 3.5, where we’ll also see what happens when there are 
several control variables. In fact, even for the case of 3 x3 determinants where we 
have so far used the formula (1.86) we’ll see that an improved method of evaluation 
is available. 

Let’s now turn to the case when time is regarded as continuous, so our linear 
system model consists of the matrix differential equation in (3.7), namely 


dx 
=2=Ax+b 3.43 
Fi ein aes (3.43) 


where again we are assuming at present that there is a single control variable, so b is 
an nx1 column vector and A is an nXn matrix, and x(t), u(t) are continuous 
functions of time ¢. It very often turns out that it is more complicated to deal with 
models using differential equations than those using difference equations, and this is 
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certainly true when we try to establish the controllability condition for (3.43). We 
therefore shan’t attempt to prove this, but simply state the result — which actually 
turns out to be the same in mathematical terms! 

The system (3.43) is controllable, in the sense that there exists a control 
function u(t) which transfers the system from any initial state x(0) to any final state 
X; in a finite time, provided the controllability matrix U defined in (3.41) is non- 
singular. However, this time there are no simple expressions like (3.32) or (3.40) for 
controls which will achieve the desired result. 


@ EXAMPLE 3.10 


Return to the electrically heated oven described in Example 3.7. Define as the 
state variables the excesses of temperature over that of the surroundings T,, 
that is 


=7T-T m=T-T 
Since T, is assumed constant we have 
dx, d7 dx, d7j 


“dt a" “dt dt 
so (3.25) and (3.26) become 


dx, 
C; = = ajri(X2 — X) 


dx, 
°C =e = iF (Xz — X4) — AglX2 + U 


where we have used the fact that x,-x,= 7|- 7. To avoid messy algebra, 
suppose c= 1, ¢ =2, a= 10, a,=30, 5 =2, =1. 
You should check that we can then write these equations in the matrix form 


x] _[-20  20]/x] 0], 
x| | 10 -25||x,|~ | 
A b 


The controllability matrix is 
U=[b, Ab] 


a 
dt 


and 
det U=-10x}=-5+0 


We therefore conclude that the system is controllable; that is, we can indeed 
taise the temperature of the oven interior from any initial value to any final 
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value in a finite time merely by altering the current through the heating element 
in the jacket. 


EXERCISE 3.16 Verify that (a) the mechanical system in (3.12) is controllable, and (b) 
the drug model in Example 3.6 is controllable. 


EXERCISE 3.17 Determine for what values of the real parameter a the system 


is not controllable. 


EXERCISE 3.18 In Chapter 1 we described in Example 1.4 Fibonacci’s model for a 
population of rabbits. We saw that if rabbits live in ‘paradise’ — an infinitely large 
green island with no other animals or diseases — then their numbers will increase 
without limit. This time let’s use a continuous time model: denote by x, (t) the number 
of rabbits at time t. Suppose that growth is exponential, that is 


x ()=e%x, (0) (3.44) 


for some positive constant a, where x,(0) is the initial number of rabbits when 
counting begins at t=. Differentiating (3.44) with respect to t gives 


dx, at 
— =ae' 0: 
apes ,(0) 
=ax, (3.45) 
Unfortunately for the rabbits, foxes are introduced onto the island. These carnivorous 
animals feed on the rabbits; indeed, if there were no rabbits on the island the foxes 


would simply die out, again at an exponential rate, so that exactly as in (3.45) we 
would have 


dx, 
—=-bx. 3.46 
dt 2 ( ) 


where x,(t) is the number of foxes at time ¢ and b is another positive constant. We 
now assume that when rabbits and foxes cohabit on the island: 


x,(t) =e "'x,(0), 


(i) the rate of growth of the rabbit population is reduced by an amount 
proportional to the number of foxes (i.e. by an amount —cx,); 

(ii) the rate of growth of the fox population is increased by an amount 
proportional to the number of rabbits (i.e. by an amount dx, ). 


These assumptions mean that (3.45) and (3.46) are replaced respectively by 


—| sax, - cx, (3.47) 


— = dx, - bx, (3.48) 
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where c and d are also positive constants. In matrix form (3.47) and (3.48) can be 
written as 


d x a =C x GB 49) 
dt | x, d -b|| x, : 
A 

Depending upon the values of the parameters a, b, c and d, the numbers of animals 
can increase without limit as t—>o. It isdecided to attempt to control the environ- 
ment by introducing a disease which is fatal to rabbits but does not harm foxes. This 
has the effect of adding a control term u onto the right-hand side in (3.47). Test 
whether the resulting system is controllable — that is, whether applying control in 
this way will allow the numbers of both rabbits and foxes to be brought to desired 
levels. 


3.3 OBSERVABILITY 


The state variables tell us everything there is to know about a system at any 
particular time. For example, if a car is being driven along a road the state variables 
could include the speed, acceleration and position of the vehicle, its mass (which is 
decreasing due to consumption of fuel), the engine temperature and oil pressure, and 
so on. The values of all these things define the state of the car, and knowing what’s 
happening enables us to drive accordingly. In fact, it’s still possible to drive the car 
perfectly well without knowing, for example, what the oil pressure is — and, indeed, 
few cars have an oil pressure gauge these days. So for reasons of cost or practicabil- 
ity it’s very often the case that only certain variables, called outputs, are monitored 
and their values used to control the system. The problem of observability is whether 
it’s possible to determine the state of a system from a knowledge of its outputs (it is 
assumed that we know what inputs we are using). If the answer turns out to be ‘no’, 
so the system is not observable, then a different set of outputs will have to be 
selected for monitoring. 

Consider as another example the economy of a country: finding out its state is 
exceedingly difficult. The actual number of state variables is immense — for example, 
the daily income and expenditure of every individual is a crucial element of the total 
information. Clearly it is completely impractical to monitor all these constituents of 
the overall economic picture. Instead, key indicators such as the levels of imports 
and exports, the volume of manufacturing output, the amount of money in 
circulation, and so on, are measured and those in charge of economic affairs then try 
to determine what the true state of the economy is. 

To keep things simple we'll just look now at the case where there is a single 
output variable y, and leave the situation where there are several outputs until 
Section 3.5. We assume that the output y is a linear combination of the states. This 
means (as in equation (1.67), Chapter 1) that 


YH=CyXy + CX ++ + C,X, (3.50) 
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where c,, C2, ..., C, are constants. We can also write (3.50) as 
y=cx (3.51) 


where c is the 1 x n row vector with components c,,..., c, and x is the usual nx 1 
state vector with components x,,...,%,. The expression cx in (3.51) is called the 
scalar product of the two vectors c and x. Once again it turns out to be easier to 
begin with the discrete time case, so in (3.50) we understand that each of the 
variables y(k), x, (k), ..., X,(k) is defined for k=0, 1, 2, .... Recall that our model is 
the set of difference equations (3.29). We know completely what inputs we are using 
(i.e. the control sequence u(0), u(1), u(2), ...) so whatever these are won’t affect the 
observability question. It’s obviously going to simplify matters if we suppose our 
input is zero, that is u(k) =0, k>0. 
The equations describing the system are therefore just 


x(k+1)=Ax(k), k=0,1,2,... (3.52) 
Setting k=0 in (3.51) gives 
y(0) = cx(0) (3.53) 
and similarly k= 1 produces 
y(1) = ex(1) = cAx(0) (3.54) 
since x(1) = Ax(0) from (3.52). Continuing this process gives 
y(2) = cx(2) = cAx(1) = cA*x(0) (3.55) 
and 
y(3) = cA*x(0), ..., y(n 1) = cA"™'x(0) (3.56) 
We can combine together the equations (3.53) to (3.56) to produce 
x | | 
y(2) |=] cA? |x(0) (3.57) 
y(n=1)} | can! 
= Vx(0) (3.58) 
where V is the nxn matrix on the right-hand side of (3.57) with rows c, cA, 
cA*,..., cA""!, Provided this matrix V has an inverse V~' we can solve (3.58) to 
produce 
(0) 
x@=v7} 90) (3.59) 


y(n—1) 
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Equation (3.59) expresses the initial state of the system in terms of the outputs at 
times 0, 1,2,..., 2-1. Once x(0) is known we can find x(1), x(2),... by using 
(3.52). In fact, we saw in Chapter 1, equation (1.61), that x(k) =A*x(0), k> 1. In 
other words, we can indeed determine the state of the system by measuring the 
outputs, so the system is observable provided the observability matrix V is non- 
singular, that is 


det V#0 (3.60) 


@ EXAMPLE 3.11 
Let’s return to the discrete system of Example 3.9, where A is the 3x 3 matrix in 
(3.39). Suppose that the output variable which is measured is 
yl k) = 2.x, (k) — Xk) (3.61) 


so that c=[2, 0, -1]. The observability matrix V has first row equal to c, second 
row 


1 =1) 2 
cA=(2,0,-1]} 3 0 4 


-5 6 -2 
=(2x1+0x3-1x-5, 2x-1+0x0-1x6, 2x2+0x4-1x-2) 
=[7, -8,6] 
and third row 
cA? =(cA)A 
=([7,-8,6]A 
= [-47, 29, -30] 


Notice that as in the case of constructing the columns of a controllability matrix 
via (3.42), we do not compute powers of A, but use the expressions 


cA?=(cA)A, cA®=(cA’)A,... 
The matrix Vis therefore 
2 0 -1 
7 -8 6 
-47 29 -30 


V= 


and using (1.86) the determinant of Vis 
6 7 -8 
-47 29 


LY ae 
ce v=2|38 =| 4 3 


= 2[(-8 x -30) - (6 x 29)] — 1[(7 x 29) - (-8 x -47)] 
=305 +0 
showing that the system (3.39) with output (3.61) is observable. 


Observability y 151 
EXERCISE 3.19 Returm to the system described by (3.35), and suppose that the output is 
y(k) = —x, (k) + 3x3 (k) 
(a) Determine the observability matrix V and show that the system is observable. 


(b) If y(0)=—5 and y(1)=1, use (3,59) to determine the initial state x(0), 


EXERCISE 3.20 Consider the system described by 


fer 2 
xe ne[ ihe 
y(k) = Bx,(k)+x,(K), k=0,1,2,... 


Find for what values of the parameter f the system is not observable. 


EXERCISE 3.21 Test the system (3.52) for observability when A is the matrix in Exercise 
3.14 and the output is y(k) = x,(k). 


The way we have defined the concepts of controllability and observability shows 
that they are not connected. Indeed, it’s certainly possible for a controllable system 
to be not observable, or for an observable system to be not controllable, as the 
following example illustrates. 


@ EXAMPLE 3.12 
Consider the system described by 


ai lige a 
aks ti={_ 1 Z)ae+| ute (3.62) 


y(k) = Bx,(k) + x,(k), k=0,1,2,... 
Suppose a=1 and £=1: the controllability matrix is 
altos. 
v-ls 3] 
b Ab 
and the observability matrix is 
alut! pati [ie 
v-[ lee 
Using (3.34) gives det U=0, det V=6, so the system is observable but not 
controllable. You may have noticed that (3.62) is a combination of the equations 
in Exercises 3.13 and 3.20, where you were asked to find values of the 
parameters a and £ for which the system was either not controllable or not 
observable. These ‘critical’ values of a and f are independent of each other. 


Thus, for example, if a#1, 2 and B=-1 we have controllability but not 
observability. 
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In view of the above example, it’s therefore intriguing that mathematically the 
conditions det U+0 for controllability and det V+0 for observability are so 
similar: the matrix U has columns b, Ab, Ab, ..., A"~'b, and the matrix V has rows 
c, cA, cA*,..., cA""'. This similarity can be exploited to construct what are called 
‘dual systems’ (see Problem 3.16). These can be used to obtain many useful results 
involving controllability and observability. 


EXERCISE 3.22 Show that the system 


x(k+ v=[ 3 5 po 


y(k) = —x, (kK) + x(k) 
is unobservable, Show also that when 
=|2 
x(0) [3] (3.63) 


then the corresponding output is y(k) =0 for all k>0. 


Notice in Exercise 3.22 that if the initial state x(0) is zero then the subsequent output 
is also zero; hence, there is no way of distinguishing the initial state (3.63) from 
x(0) =0 by measuring the output. This agrees with our definition of an unobservable 


system. 
To end this section we can consider the differential equation model 
dx 
—=Ax, =cx 3.64 
a y (3.64) 


where x(t), y(t) now depend upon the continuous time variable t. As we saw for the 
case of controllability in the previous section, the mathematical condition for 
observability is exactly as before in (3.60), namely that the observability matrix V 
has a non-zero determinant. 


@ EXAMPLE 3.13 


We looked at the controllability of the electrically heated oven problem in 
Example 3.10. Suppose it is only possible to measure x,, the excess of the 
jacket temperature over the surrounding temperature. We ask whether it is 
possible to find the oven (excess) temperature x, by measuring the output 
y=. We therefore have c=[0,1], and the matrix A was given in Example 
3.10, so we can construct 
1180: Iie 
v-[% url 
and 


det V=-10#0 
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This shows that the system is observable, so the oven interior temperature can 
indeed be determined merely by measuring the temperature of the jacket. 


EXERCISE 3.23 Suppose that for the rabbit—fox population model described in Exercise 
3.18 it is only possible to count the total number of animals, Is it nevertheless possible 
to determine the individual numbers of rabbits and of foxes? 


EXERCISE 3.24 Retum again to the mechanical system described by (3.12), illustrated in 
Figure 3.4. If it is only possible to measure the displacement x, of the mass, can its 
velocity x, be determined? 


We'll now show how to obtain an actual expression for the initial state in the differ- 
ential equation case (3.64), corresponding to (3.59). From (3.64) differentiating y gives 


ay ak 
dt dt 
=cAx (3.65) 
since c is a constant vector. Similarly, differentiating (3.65) gives 
2. 
HO) eee eer oe (3.66) 
dt? dt 
yD = cAMly (3.67) 


where y“” denotes the ith derivative of y with respect to t, Now set t=0 in each of 
the equations (3.64) to (3.67), giving 


y(0)=cx(0), yO) = cAx(0) (3.68) 
y(0) = cA?x(0), ..., y""? (0) = cA"! x(0) 


where y‘(0) denotes the value of the ith derivative of y at t=0. Writing the n 
expressions in (3.68) in combined form gives 


y(0) 
y%@) 
yO) |=Vx(0) 
y" - D0) | 
so that provided the system is observable we can write 
: 
xo=Vv-] ¥ ©) (3.69) 


Ly" 20) 


154 Making Things Happen 
This is an explicit expression for x(0) which requires a knowledge of the output y 
and its first n — 1 derivatives evaluated at t=0. 


EXERCISE 3.25 A system modelled by the linear differential equations (3.64) with 


a=|7) ot c=[1,2] 
is found to have a scalar output 

y(t)= -20e- + 216 
Verify that the system is observable, and hence obtain x(0) using (3.69). 


3.4 LINEAR FEEDBACK 


We introduced the crucial concept of feedback in Section 3.1. In essence, this means 
that the control applied to a system takes account of the current state of that system — 
information about the state is ‘fed back’ to the controller, which reacts appropriately. 
We shall only consider systems with a single input, in either discrete or continuous 
form, respectively 


x(k+1)=Ax(k)+bu(), &=0,1,2,... (3.70) 
a) = Ax(t) + bu(t) G.71) 


where as before x is the nx1 column vector describing the state, A is an nxn 
matrix and b is a constant nx 1 column vector. The idea of linear feedback is to 
make the control a linear combination of the states, that is 


URf,_X,+frXy to + Sry (3.72) 
or 
u=fr (3.73) 


where f,, f;,...,f, are constants, which are the components of the row feedback 
vector f. If we apply (3.73) to (3.70) and (3.71) we obtain 


x(k+1)=(A+ bf )x(k) 
(3.74) 
SO = (4+ ox 
dt 
which are called closed loop systems. In either case the matrix of the system is 
A=A+bf (3.75) 


and is called the closed loop matrix. A key theorem discovered around 1960 states 
that if the system (either version) in (3.74) is controllable then it is always possible 
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to determine a vector f such that the eigenvalues of the matrix 4 in (3.75) can be 
made to take any preselected set of n values (subject only to the condition that any 
complex values occur in conjugate pairs). 

You may feel rather intimidated by the apparent complexity of this theorem, so 
let’s explore in some detail what it means, First, we substitute the expression (3.73) 
for the feedback control into the equations (3.70) and (3.71). The complete system 
which results is then given by (3.74), and is called ‘closed loop’ for the reason given 
in Section 3.1 (see Figure 3.2). The systems in (3.74) for the discrete and continuous 
time cases are respectively 


x(k+ 1)=sdx(k) (3.76) 
Ax) = a y(r) (3.77) 
dt 


where sf is the matrix in (3.75). We saw how to solve difference equations of the 
type (3.76) in Chapter 1, Section 1.4. It was found that 


x(k) = A*x(0) (3.78) 


Furthermore, an expression was given in (1.106) for 1* which depended upon the 
eigenvalues A,, A), ..., A, of 1. You may remember that the eigenvalues are defined 
as the roots of the nth-degree polynomial equation 


det(AJ — 4) =0 (3.79) 


called the characteristic equation of #4. Never mind the details of how these 
eigenvalues are calculated. What matters is that the solution x(k) given in (3.78) of 
the closed loop system depends crucially on these eigenvalues: if we can control 
them, then we can control the behaviour of x(k) — which is, after all, our prime 
objective. Our theorem therefore adds precision to the property of controllability, 
which stated that a control satisfying required objectives could be found — the 
theorem tells us just what can be achieved using linear feedback. There is only 
one minor restriction, caused by a law of algebra: since the polynomial equation 
(3.79) has real coefficients its roots either are real or occur in complex conjugate 
pairs a+if, a—if. Apart from this, we can make the eigenvalues of 4 equal to any 
set of values we choose by selecting an appropriate feedback vector f. For this 
reason the theorem is often called the eigenvalue assignment theorem. 

Although not covered in this book, similar remarks concerning the solution of 
the differential equation (3.77) apply: the solution again depends upon the 
eigenvalues of of (actually involving terms in e*" rather than (A;)*), and making the 
A equal to a predetermined set of values largely determines the way the system 
behaves. 

In either case, if the system is not controllable then assignment of arbitrary 
eigenvalues to # is not possible. 
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™@ EXAMPLE 3.14 
We wish to find linear feedback such that when applied to the system 


_|1 -3 1 
x(k+ v=[} 3} «| Ju (3.80) 


the resulting closed loop system has eigenvalues —1, —2. 
You should check that the system is controllable. The closed loop matrix in 
(3.75) is 


st =A+ bf 


aes or (3.81) 


Notice that the product of the column vector b and the row vector f produces a 
2x2 matrix, using the rule set out in Chapter 1, Section 1.4. In general the 
product bf is equal to an nxn matrix with b)f, being the element in row i, 
column j. The eigenvalues of are given by solving (3.79), which here is 


_af4-t=f, 38-6 |_ 
a re A-2-26, =0 
Using the formula (3.34) we can evaluate this determinant as 


det(A/— 4) =(A-1- f,)(A-2-2f,) - (3- f,)(-4-2f,) 
= AA f, +2, +3) +4 (Bf, -2f +14) 
For sf to have eigenvalues —1, —2 its characteristic polynomial must be 
(A+ 1)(A+2)=A?+3A42 
Comparing the coefficients of these two quadratics produces the equations 
=f, «2 -3=8 
Bf, -2f,+14=2 


which are easily found to have the solution f,=-2, f,=-2. The required 
feedback (3.72) is therefore 


u=-2x,-2x, 


EXERCISE 3.26 If 


“it of 


determine a feedback vector f such that the eigenvalues of A + bf are —2, 3. 
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EXERCISE 3.27 If 


0 1 0 0 
A=|0 0 1), b=|0 
67 =I) 46 1 


determine f such that A + bf has eigenvalues —1, —2 + 3i. 
EXERCISE 3.28 If 


fh Sh oli} ue 


show that the system is not controllable. Show also that A+ bf has one eigenvalue 
equal to 2 whatever the values of f, and f,, so it is not possible to assign arbitrary 
eigenvalues to the closed loop matrix. 


EXERCISE 3.29 Show that the system described by 


1 
1 ju 
= 


is not controllable. Show also that with linear feedback of the form u=f,x, + fx; the 
closed loop system has two fixed eigenvalues, one of which is equal to —3. Determine 
the second fixed eigenvalue and values of f, and f, such that the third eigenvalue of 
A+ bf is equal to —4. 


When n> 2 the method used in Example 3.14 is not satisfactory in general, and 
a number of other techniques have been devised. One relies on transforming the 
matrix A into ‘companion form’ and the vector b into a special form (see Problem 
3.14). Details are outside the scope of this book. 


@ EXAMPLE 3.15 


Let's return again to the rabbit-fox population model described in Exercise 3.18. 
Suppose that the values of the constants a, b, ¢, din (3.47) and (3.48) are such 
that 


— =2x, - 3%, Ot 22% -% (3.82) 


As indicated earlier, one way to control a ‘population explosion’ is to introduce 
linear feedback in the form of a disease which affects rabbits but not foxes. This 
means that the rate of growth of the rabbit population is reduced by an amount 
fx,, where fis a positive parameter. In other words, the first equation in (3.82) 
becomes 

dx, 


“ae = 2x, - 3x, - fx, 


158 Making Things Happen 


Combining this with the second equation in (3.82) shows that the closed loop 
matrix is 


-[@-N -3 
ela 


The eigenvalues of this matrix are given by 
0=det(A/— 4) 


=|4-24+f 3 
-2 A+1 


=A +A(F-1) + 44 (3.83) 
As we have mentioned, the solution x,(t), x(t) of the closed loop system 
involves terms e“', e72' where A, and A, are the roots of (3.83). If either of the 
roots has a positive real part then the exponential term gets larger and larger as 
t increases — this is a ‘population explosion’. If both the roots have a negative 
real part then e*'+0 as t-~, so the population declines to zero after a 
sufficiently long time. If A, and A, are purely imaginary, that is A,=iw, 4,=-iw 
where wis real, then as we saw in Section 1.2, Chapter 1, 

e“'=cos wt+i sin wt 


so the population oscillates in size but remains finite. The roots of (3.83) have 
the form 
1- f+ Vi(F-1)?- 4(f+ 4) 
2 
You should be able to see that the real parts of the roots are positive when 
0< f<1 and negative when f>1; when f=1 the roots are purely imaginary. We 


therefore conclude that the smallest value of the feedback parameter which will 
prevent the numbers of rabbits and foxes from growing ever larger is f= 1. 


EXERCISE 3.30 A system has 


“ol 


and linear feedback u=f,x, +f,X2 is applied. Determine the conditions to be satisfied 
by f, and f, so that the eigenvalues of the closed loop matrix are purely imaginary. 
(Hint: you’ll need to find the form a quadratic equation must take in order to have 
purely imaginary roots iw and —iw, where w is real.) 


3.5 MULTIPLE CONTROLS AND OUTPUTS 


We now look at the more complicated situation where there are several control 
variables. This section is rather more difficult than the rest of the chapter, and can be 
omitted on a first reading of the book. Let’s begin with the discrete equation 


x(k+1)=Ax(k)+Bu(k), k=0,1,2,... (3.84) 
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where now u(k) is a column vector consisting of m control variables u,(k), 
u,(k), -.-» Um(k) with m>1. If there are n state variables x, Ch) Xa (Rs aasy cae) 
then in (3.84) the matrix A is nxn and B is nx m. In practice it will always be the 
case that m<n, and in fact usually m<n so that B is a rectangular matrix. An 
example of (3.84) with n=3 and m=2 was given in Problem 1.26, Chapter 1, as 
the model of a trout fish farm. Suppose we investigate the controllability problem 
using the same sort of argument that we applied in Example 3.8 in the previous 
section. Without going into details it turns out that the controllability matrix now 
becomes 


U=([B, AB, A’B,...,A""'B] (3.85) 


which is exactly the same as (3.41) except that the column vector b is replaced by the 
nxm matrix B. Notice that U in (3.85) has n rows as before, but now has mn 
columns. The controllability condition therefore cannot now be that U is non- 
singular (or equivalently det U+0) since the concepts of non-singularity and 
determinant only apply when U is square. Instead, the controllability condition is 


rank U=n (3.86) 


What does this mean? We must now spend some time explaining the idea of the rank 
of a matrix, and then look at a method of computing the rank. Before doing so, let's 
work out U in a simple case. 


@ EXAMPLE 3.16 


Let the matrices in (3.84) be 


Ke) 01 
0 0 1), B=/1 0 (3.87) 
010 


A= 


so that n=3 and m=2. The product AB is found using the rule explained in 
Section 1.4 of Chapter 1 (see equation (1.64)), giving 


and 
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so that the controllability matrix (3.85) is 


U=[B, AB, A’B] 

Cunpiectes'h  H4 

|i; WO! Ags Olea 40 (3.88) 
Jeni t One ac0 


Cy Cy Cy Cy Cy Cy 


The condition for controllability is that the rank (yet to be defined!) of the 
matrix in (3.88) must be three. 


For our present purpose we can define the rank of a matrix as the largest number 
of independent columns which it possesses. A set of columns of a matrix is called 
independent if no column in this set can be expressed as a linear combination of any 
of the other columns in the set. For example, the first and second columns c, and c, 
in (3.88) are independent of each other since there is no way that we can express c, 
as a multiple of c,. Remember that a linear combination of columns c¢,, C2, C3, .-. 
simply means the expression a,c, + @,C, + @3;C; +--+, where the a are constants, not 
all zero. From (3.88) we see that 


1 0 iE 
1;=/1]/+ ]0 
1 1 0 
i) Gl 2 


so c3 is not independent — it is said to be dependent upon c, and c, since it can be 
expressed as a linear combination of them. Similarly you can see that the other 
columns in (3.88) satisfy 


Ca= Cg, Cs=Cy+2C,, Cg=Cp 


so all the other columns of (3.88) can be expressed in terms of c, and c,. Since the 
largest number of independent columns of U in (3.88) is therefore two, its rank is 
two. Hence the system with matrices A and B in (3.87) is not controllable, since it 
does not have rank 3. 


M@ EXAMPLE 3.17 


The matrix 
1 #2 tat 
M=|2 8 2 -2 8 (3.89) 
3°12 0 -6 9 
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has rank 2, since 
C,=4c,, C,=0C,-2¢,, C,=3C,+C, 


but c, and c, are independent — we cannot express c, in the form ac, for any 
value of a. Hence there are two independent columns of M, so M has rank 2. 


EXERCISE 3.31 For what values of k will the matrix 


have rank equal to (a) 1, (b) 2, (c) 3? 


Some relevant facts about rank are: 


(i) A matrix has rank 0 only if all its elements are zero. 

(ii) The rank of a matrix cannot be bigger than the smaller of its two 
dimensions. For example, a 3 x 4 matrix cannot have rank bigger than three. 

(iii) In view of (ii), the nx mn controllability matrix U in (3.85) cannot have 
rank bigger than n. Hence the controllability condition requires that rank 
U has its maximum possible value. 


We now need to describe a method for computing the rank of a matrix. The method 
is called gaussian elimination after the German supermathematician Gauss (he 
worked in the first half of the nineteenth century). 
We shan’t go into details as to why the method works — these belong in a book 
on matrix algebra. 
(i) | We begin by considering a square (upper) triangular matrix, that is all the 
elements below the principal diagonal (northwest to southeast) are zero. 


For example, 
fi. 3 4 
M, = |0 ‘2, 0) (3.90) 
Ort 


is a 3x3 triangular matrix. The elements on the diagonal are called the 
pivots, and the rank is simply equal to the number of non-zero pivots. 
Thus M, in (3.90) has rank 3 (the pivots are 1, 2, 1) and 


fi..3 4 1 
= |0-2,.9) =! 

M,= |) 073. 2 (3.91) 
lo 000 


has rank 3 since there are three non-zero points (1, 2, 3). In each case the 
diagonal has been indicated by a dotted line. Notice that the last row of 
M,j in (3.91) consists of all zeros. 
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(ii) 


(iii) 
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In the same way, the rank of a rectangular matrix which begins with a 
triangular block is also equal to the number of non-zero pivots, as the 
following example illustrates: 


esa eed 
M,=|0 2 0:5 6], rankM,=3 
1} 4 


0 


The triangular block to the left of the dashed line is in fact the matrix M, 
in (3.90). 

This property also holds if there are rows of zeros in the triangular 
block, provided these continue right along the matrix, as illustrated by 


oad) el oT 
2 (eee Sy fee ee) bs 
AM T=al ate en | ea 
0.0.0 200 OnG 
—'N,—> 
Another example is 
aid On gel 
2 5 =1'r 10 
Ms; = H 
5" j0 0 0 01/00 0 
000 0/000 


where rank M, =2 because of the non-zero pivots 1, 2. However, for the 
matrix 


etn ee 
ee Sal i 
= H a) 
Me=1q 0 0 0:0 0 -1 aad 
000 0104 0 


Cr Cy C3 Cy C5 Cg Oy 


we cannot say that rank M, =2, because the bottom two rows of zeros in 
the triangular block (the first four columns of M,) have some non-zero 
elements along the remainder of the rows in the last three columns of 
M. 

To handle a matrix like M, in (3.92), we need the fact that swopping 
around columns of a matrix does not alter its rank. In (3.92) if we 
interchange the third and seventh columns, and the fourth and sixth 
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(iv) 


It’s best 


columns, we get 


Reo Sietelo uO ola 
Od, Oimkjillie=Iee5 
0 0 -1,0/0 00 
0.0) (0) 74:08 10/0 


Cy Cy Cy Oe Cy % 


Since this now contains four non-zero pivots (1, 2, —1, 4) it follows that 
rank M, =4. It is convenient to denote the column swops by c; <9 c; and 
Ce O Coe 

We can use the following elementary row operations to reduce a 
rectangular matrix to the form described in (ii): 


Interchange any two rows of the matrix: r, <> r; denotes swopping 
round the ith and jth rows. 

Add an arbitrary multiple of any row to any other row: 7; + pr; denotes 
adding p times row j to row i, where p is any positive or negative number. 


to consider some examples to see how the method works in practice. 


@ EXAMPLE 3.18 


(a) 


(b) 


Consider the controllability matrix U in (3.88), Our objective is to produce a 
triangular block with a maximum possible number of non-zero pivots. This 
is done by applying appropriate column swops and/or elementary row 
operations. In order to begin, we must get a non-zero pivot in the (1, 1) 
position of the matrix, so we interchange columns 1 and 2 in (3.88) to get 


Oe eh 2] 
OF fn Te 0) od, 
O fd 1 0 4.6 


We now subtract row 2 from row 3 (i.e. 7 — r,) to get 


fi eor ai fateabes 
0111010 
000/000 


and this has the required form — no further reduction is possible. Since the 
triangular block to the left of the dashed line has two non-zero pivots, the 
rank is 2, agreeing with what we found earlier. 

Consider the matrix M in (3.89) and apply operations to it as indicated, so 
as to obtain the first column of the triangular block, with zeros below the 
first pivot: 
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(c) 
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The long arrow indicates what happens to M when twice row 1 is 
subtracted from row 2, and three times row 1 from row 3. We haven't 
finished, because we can still get another non-zero pivot by moving any of 
the elements —3 to the (2,2) position. For example, swop rows 2 and 3, 
then swop columns 2 and 3: 


ny 
t 
° 
fos! 
| 
CHa 
oon 
1d 
ows 
I 
own 


The required form has now been obtained, showing that the rank is 2 
because of the two non-zero pivots 1, -3. 
The following example is self-explanatory: 


3° =2 0 7] net, |9' =2'°0" 4 
M=|-1 1 2 2 3 i 

1 0 4 5] a-35 
“is lo 2 4g 

3 =2 0/1 

Iy-2ty H 
Ole jue’ 
0 0 0}0 


The pivots 3, 1/3, 0 show that rank M=2. 


To summarize the procedure: 


(i) If necessary, apply column and/or row swops to get a non-zero pivot in 
the (1, 1) position. 

(ii) Apply elementary row operations to reduce all the elements in the first 
column below the first pivot to zero. 

(iii) Repeat (i) if necessary for a non-zero second pivot, in the (2, 2) position; 
repeat (ii) so that all the elements in the second column below the pivot 
are zero. 

(iv) Continue like this until the required form containing a triangular block is 


obtained. 


EXERCISE 3.32 Determine the rank of the following matrices: 
(a) 
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(b) 
Sh Mey eye) 
2 Ahh OT 
2) 4 =6)58 

(c) 
Ol el On ae 
) I ee a ee | 
Deel) 3) 2 ieee 
i205 t= 9 8a 
0S) le. 1S 20eey, 


EXERCISE 3.33 Compute the controllability matrix U in (3.85) when 


Tez 10) ee! 
A=| 3 4 Lh. Bee (3.93) 

-1 2 -5 Eile 44 
Determine rank U, and hence decide whether the system (3.84) is controllable in this 


case. 


EXERCISE 3.34 Show that the system (3.84) is controllable when 


of 670 0 0 
39 0) 2 1 0 
Beha) TOMO Ti CallONRO 
0 -2 0 0 0: 1 


If one of the control variables ceases to operate, that is either u,=0 or u, =0, test 
whether the system remains controllable in each case. Notice that in each of these 
cases B reduces to a column vector only. 


EXERCISE 3.35 If 


=[2 (@=3)] pif! 1 
a=[i 2 } =| fee 


determine for what values of the parameter a the system (3.84) is not controllable. 
Investigate the situation if the first control variable ceases to operate, that is u, =0 
(compare with Exercise 3.17). 


Before going on to consider observability for the case of several outputs, it’s useful 
to note what happens when gaussian elimination is applied to a square nx n matrix 
M. In this case the rank is still equal to the number of non-zero pivots after the 
matrix has been reduced to triangular form. If any pivot is zero then det M =O; if all 
the pivots are non-zero then 


det M = (—1)'x product of the pivots 


where f is the total number of row and column interchanges (if any). 
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@ EXAMPLE 3.19 


(a) The determinant of the triangular matrix M, in (3.90) is 1x2x1=2. The 
determinant of M, in (3.91) is zero, since there is a zero pivot. 

(b) The determinant of any triangular matrix is just equal to the product of the 
elements on the principal diagonal, for example 


4 2 Ag 
det} 0 a2 3 | = 441422433 


0 0 agg 
(c) 

3 -2 O] qm+tq [3 -2 0 
-1 1 2|/——>/0 3} 2 

1. 0 6] 4-3" 
EVO SS 

nom, [3 -2 0 

— 3 2 

Or 02 


The determinant of the original matrix is therefore 3x 4x 2=2. 
(d) Consider the controllability matrix U in Example 3.9: 


1 3 -18) ,-, [1 3 -18 
0 7.=19|— S10 7:8=16 
0 -10 59 


so det U=1x7 x (223/7) =223, agreeing with what was found earlier. 


For determinants having numerical elements, evaluation using gaussian 
elimination is always preferable to using expansion formulae like that in (1.86). It is 
also worth mentioning that an extension of gaussian elimination provides a good way 
of computing the inverse of a matrix. 


EXERCISE 3.36 Use gaussian elimination to evaluate det V, where V is the observability 
matrix in Example 3.11. 


EXERCISE 3,37 Use gaussian elimination to evaluate 


t. 2d 
feed) ee ees 
act Tle 
eee ey Yims 
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EXERCISE 3.38 Show that the system with matrices 


01 00 0 
Az|-1 0 0 0} 4/2 
OG Gal 0 
00 -1 0 1 


is not controllable. 


To complete our discussion on controllability with multiple controls, we note 
that the condition (3.86) still applies to the differential equation description (3.7), 
which we repeat for convenience: 


OL Ax+ Bu (3.94) 


dt 
where u is again an mx 1 column vector and B is nx m. The expression for U in 
(3.85) is also unchanged. 

Let’s now move on to observability when there are multiple outputs, so the 
output is a column vector y with r components y,,..., y,. As before we assume 
linearity, meaning that each output variable can be expressed as a linear combination 
of the states. We can therefore write 


y=Cx (3.95) 


where C is an rx n matrix and x is the state vector. Equation (3.95) can apply either 
to the difference equation model (3.84) or to the differential equation model (3.94). 

Notice that if C is square and non-singular then we immediately obtain from 
(3.95) 


xa 'y. 


showing that the state can certainly be determined from the output, so by definition 
the system is observable. Otherwise, we have r< 7 and it turns out that the condition 
for observability is 


rank V=n (3.96) 
where V is the observability matrix 
Cc 
CA 
v=| ca? (3.97) 
can} 


Notice that V has n columns and rn rows. When r=1 then (3.97) reduces to the 
previous expression in (3.57) for the single output case. In order to compute the rank 
of V in (3.97) we use the fact that if the rows and columns of a matrix are 
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interchanged this does not affect the rank — this is called transposing a matrix, and is 
denoted by V™ (see Problem 3.16). The method of gaussian elimination can then be 
applied to V", exactly as was done for U. 


™@ EXAMPLE 3.20 


Let the matrices in (3.84) (or (3.94)) and (3.95) be 
fist) 0) 
0 0 1), c=[} 1 4 
0 30 


A= 


so that n=3 and r=2. Using our multiplication rule we get 


ei bee eet 
ca=|} 2 | 
zie ee 2% 
cA =(cA\a=1 : sy 
so that 
T3228 
120 at 
ee lees 
V= rier ea (3.98) 
Leela 
Ne ae 
To obtain V™, we write each row of (3.98) as a column to obtain 
is Aivtley Ley Deny 
Viet dO lea? (3.99) 
OF: a Or Ain 


The first row of (3.98) becomes the first column in (3.99), the second row of 
(3.98) is the second column in (3.99), and so on. To compute the rank of (3.99) 
we then apply the gaussian elimination procedure as follows: 


pea tik al 
‘T 
Vi——s|'0| <1 0 1 1 0 
Gb a th ae 
ee il a Wish hg 
Set) <i iiniea) oy 
(Ole cOlmti it, 22 


There are three non-zero pivots in the triangular block to the left of the dashed 
line, showing rank V=3 and hence the system is observable. 
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EXERCISE 3.39 Compute the observability matrix V in (3.97) when A is the matrix in 
(3.93) and 


EGE OY 
eal? a 


and hence test for observability. 
To close this chapter, it’s interesting to note that the eigenvalue assignment 
theorem still holds when there are multiple controls. 
Provided the system is controllable it’s always possible to find an m x n constant 
matrix F such that the closed loop matrix A+BF has preassigned eigenvalues 
(subject only to the same condition as before, that any complex eigenvalues occur in 


complex conjugate pairs), The linear feedback now has the form u = Fx, meaning 
that each control variable is a linear combination of the states, that is 


U; = fin, + Sig%, to-++SiXns F=1,2,...,m 


where f,, is the element in row i, column j of F. Actual determination of such 
matrices F is well outside the scope of this book. 


EXERCISE 3.40 Verify that when 
eS iO elie wal 
aaah ltl 
then the linear feedback 
Uy = -3x,—xX,, Uy =3X, +X, 
produces a closed loop matrix with eigenvalues —3 and —4. 
EXERCISE 3.41 Show that if the matrix B in (3.84) is square and non-singular then 
u(k) = Bo'[x(k+ 1) — Ax(k)] 


PROBLEMS 


3.1 Consider equation (3.20) which describes the motion of a damped mass—spring 
system shown in Figure 3.7 with u=0, namely 


with m>0, p>0, k>0. Here x is the displacement from rest, and is given by 
Aye Ayt 
ae'+fBe*, A, #A. 
x)= B 1 2 
e'(y +t), Ay = Aq 
where a, , y, 6 are arbitrary constants and A,, A, are the roots of the quadratic equation 
mA? + pA+k=0 
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3.2 


33 


3.4 


35 


3.6 


Making Things Happen 


Consider each of the three possibilities for the roots: both real and distinct; both 
complex; both real and equal; and show that in all cases x(t) 0 as ft 0. 


Test for controllability the system described in 


(a) Exercise 3.6, equation (3.22) 
(b) Exercise 3.7 

(c) Exercise 3.9 

(d) Exercise 3.10 

(e) Exercise 3.11, using slaughter. 


Suppose that in the redwood forest model described in Exercise 1.34 it is feasible 
only to count all the trees in each 50 year period. Can the number of trees in each of 
the three age groups be determined? 


Consider the cattle ranching model described in Problem 1.23. If it is practicable only 
to count the total number of cattle, is it possible to determine the number of animals 
in each age group? 


It was seen in the buffalo population model in Problem 1.28 that under the assumed 
conditions the numbers of animals would grow by 6.3% per year. 

Suppose that (as in Exercise 3.11) a linear feedback slaughtering policy had been 
adopted, killing for food pF,,, adult females in year k + 1, so that the first equation in 
Problem 1.28 is replaced by 


Fyyq =0.95F 4, + 0.12F, — PFs 


Use the z-transform (see Section 1.3, Chapter 1) to show that the total population 
would have remained constant even if 7% of adult females were killed each year. 


The equations describing the vertical motion of a hot air balloon can be expressed as 


aT Tu 

dt 

dv_ 1 1 
ap Sy Uae. 
Gh y 

dt 


where the vertical speed is v, the change from equilibrium altitude is 4, the change in 
the temperature of the air in the balloon from the equilibrium temperature is 7, and w 
is the constant vertical wind speed. The control variable u is proportional to the 
change in heat added to the air in the balloon. Take as state variables 


xX,=T, m=v, %4=h, Y=W 
and write the equations in the matrix form (3.7). Show that the system is not 
controllable. Verify that linear feedback 

u=—}u-jh 


produces a closed loop system with eigenvalues 0, -}, -3, -3. 
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ui 


3.8 


3.9 


3.10 


3.11 


The motion of a helicopter hovering in still air is described by the equations 


2 
iy +a ce bu 
ar’ dt 
2 
as 99 _go=du 
dt? dt 
where 6 is the pitch angle of the fuselage, s is the horizontal distance of the centre of 
mass of the helicopter from the hover point and a, b, c and d are constants. The 
control variable u is the tilt angle of the rotor thrust with respect to the fuselage. Take 


d@ 
x =8, =s, =—, 4=— 
1 a=) trae 4 
as the state variables and write the equations in the matrix form (3.7). 
If b=1, d=2 and c=2a show that the system is controllable irrespective of the 
value of a. 


Consider again the mechanical system shown in Figure 3.6. If in addition dampers are 
connected between each of the masses and the fixed support then the equations (3.19) 
are replaced by 


0 0 ripe) 0 

dx_| 0 0 Oat 0 

a |-6 3 -4 o|**/0|" 
3 -G+h4A 0 -6 i 


It is possible to measure only the relative displacement x,—.x,. Construct the 
observability matrix. Reduce it to triangular form by gaussian elimination, and hence 
determine for what values of k the system is not observable. 


Determine whether the trout fish farm model described in Problem 1.26 is 
controllable. 


0 1 0 if .0) 
A=|0 0 1), B=|0 1 
6 s11 46: P 


verify that linear feedback 
Uy = -3x,+4%,-2%3, Uy = —3x, +42, - 25 


If 


produces a closed loop system with eigenvalues 1, 1, 3. 


Consider the model of the blue whale population described in Example 1.20. The 
matrix A in (1.57) has an eigenvalue greater than one, so the population continues to 
increase in size. 


(a) Is it possible to prevent this increase by culling a proportion of females under 4 
years old? (Notice that this affects x,(k +1) only.) 
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3.13 


Making Things Happen 


(b) If only the total number of females under 8 years old can be counted, can the 
numbers of females in each age group be determined? 


Two rods of unit length swing from a fixed support and are connected to each other 
by a spring as shown in Figure 3.10. 


Figure 3.10 


Masses mm are fixed at the ends of the rods (whose weights can be neglected) and for 
small oscillations about the vertical the equations of motion are 


dx; 
m —— =—mgx, + uy - u 
at J Ses Vins Neat 


dx, 
m a = —Mgx, - 2ka’x, + + uy 


where g is the gravitational constant, u, and u, are applied control forces, k is the 
spring constant, x, = 0, + ,, x, = 0, — @, and 

dx, dx, 

a aa hs 

Write the equations in the matrix form (3.7). Show that the system is controllable. 


Show also that if the two forces are equal, that is u, = u, =u, then the system is not 
controllable. 


A certain mechanical system consisting of two masses m, and m, connected by 
springs and dampers is described by the equations 


0 0 1 0 0 

dx)|| 80 0 0 1 0 
at | Ki/m, ki/m, -d,/m, d,/m, ee 1/m, a 

Kylm, (ky +k )/m, dm, -(d, + d,)/m, 0 


If m, =m, =1, d,=d,=1 and k,=}, compute the controllability matrix. Reduce this 
to triangular form using gaussian elimination, and hence determine under what 
conditions on k, the system is controllable. 
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3.14 


3.15 


The matrix 
0 1 0 0 we 9 10) 
0 0 iL 0 nese LE 
tem 
0 0 0 0 crag 
Gy —Ay-y Ayn ~Gyng vee Oy 


is said to be in companion form because it has characteristic polynomial 
det(AI-A,)=A"+a,A""! + a,A"* +0 +4, Ata, 


which can be read off from the last row of A,,. 
For example, when n=3 


Verify by expanding the determinant that 
det(AI- A,)=A° + aA? + A +a, 
Consider the control system model 


eo =A,x+du 
dt 


where d is an nx1 column vector with all entries zero except d,=1. If linear 
feedback 
ua—(f,X, + frre to t+ foXn + Aim) 


is applied obtain the closed loop matrix and verify that it is also in companion form. 
Hence deduce that the closed loop characteristic polynomial is 


Me (ay + fA" + (dy + fran rt + (Gyan t fy + (Gy + fn) 
For the discrete system 


x(k+1)=Ax(k), k=0,1,2,... 


with output 
y(k) = Cx(k) 
show that 
y(k) 
vx =| 2+) (3.100) 
veer) 


where V is the observability matrix defined in (3.97). 
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1 0 
Axil 2 1 
Die) 


show that the system is observable. Hence if 


w=| 2 
‘ 1-3(2)* + 3* 


determine x(k) by selecting three independent equations in (3.100) and solving for 
X,(k), X,(k) and x;(k). 


3.16 The transpose A" of a matrix A is obtained by interchanging the rows and columns. 
For example, if 


[2 4 
a, a 


then writing the first row as the first column and the second row as the second column 
gives 


Files Ge 
a a 
The same holds for vectors: 
b, 
b=|b)|, 6" =[by,b2,b3) 
bs 


tS 
C= [¢4,C2C3), cs C2 
&% 
Notice that applying the transpose operation twice brings you back to where you 
started from, that is 
(ANA, (sb, (VF Se 


Consider two systems 


x(k+ 1) = Ax(k) + bu(k) 09) 
y(k) = cx(k) 

and 
x(k+ 1) = A™x(k) + cTu(k) 010) 


y(k) = b"x(k) 
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Denote by U;,, V, and Uy, Vq their respective controllability and observability 
matrices. Suppose for simplicity that n = 3, and show that 


Un=Vi, Va= Up 
You will need the results 

(Ab)" = bTAT, (A?)T = (AT)? 
Since controllability of (II) is equivalent to observability of (I), and vice versa, the 
systems (I) and (II) are called ‘dual’. 
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4.1 Searching for an optimum 
4.2 Linear programming 

4.3. Transportation models 
4.4 Networks and graphs 


4.5 Optimal control 


Problems 
Further reading 


We remarked at the beginning of Chapter 3 that a basic human desire is to control 
events — to try and make things happen as we would like them to. This is all part of 
aiming to ‘get the best out of life’. This striving for optimal solutions will be 
explored in this chapter through a variety of situations. 


4.1 SEARCHING FOR AN OPTIMUM 


Here we are interested in finding the point at which a function f(x) of one 
independent variable x achieves its maximum or minimum value. 


M@ EXAMPLE 4.1 


A closed cylindrical beer can is made of sheet metal of constant thickness and 
is to hold 440 ml. The problem is to find the dimensions of the can so that the 
amount of metal used in its construction is as small as possible, thereby 
minimizing its cost. 


176 
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Suppose the radius of the can is x cm and its height is h cm. The volume is 
mx*?h=440 


so that 


The area of each circular flat end is 1x’. The circumference of the curved part is 
2zx. If this curved surface is opened out flat it becomes a rectangle of width 
2x and height h, so its area is 2xh. Hence the total area of sheet metal is 


f(x) = 2x? + 2nxh 
=2nx? + 2x] 
max 


= 2ax? + 880 
x 


We want to find the value of x which minimizes f(x). 


To fix ideas, consider the problem of finding the maximum value of some given — 
function f(x) of an independent variable x. Suppose it is known that within a certain 
range of values [a, b] of x, that is a<.x< b, the function 

(i) has a single maximum point at x = x*, and 

(ii) is unimodal on this interval [a, b]. 

Condition (ii) means that if f(x) is a continuous function then it has a single ‘hump’ 
as illustrated in Figure 4.1. More precisely, if 


X, <x <x, 


Figure 4.1 
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then as shown in Figure 4.1 we have 
FH) <fOR*),  fO*)> ft) 


Notice, however, that a unimodal function need not be continuous — that is, it 
may contain ‘jumps’, as illustrated by the example in Figure 4.2. Alternatively a 
function may not be unimodal as shown by the example in Figure 4.3 where 
there are two ‘humps’. Finding an overall maximum in such cases is more 
difficult. 

In practical applications the function f(x) often represents a profit which is to be 
maximized. Alternatively, as seen in Example 4.1, the function may represent a cost 
to be minimized. This causes no difficulties, since minimizing f(x) is the same as 
maximizing —f(x). 


f(x) 
4 


‘ 


Figure 4.2 


f(x) 


Figure 4.3 
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EXERCISE 4.1 Find an expression for the rectangular area which can be enclosed by a 
500 m length of fencing (let one side of the rectangle be x m). It is required to find 
the largest possible area. 


EXERCISE 4.2 A rectangular sheet of metal 100 cm by 50 cm is to be made into an open 
rectangular box by cutting out squares of side x cm from each corner as shown in 
Figure 4.4. and then folding up the sides. Obtain an expression for the volume of the 
box, which is to be maximized (see Problem 4.1). 


aS 100 cm > 


Figure 4.4 


If you've studied some calculus you may have encountered a method for finding 
the maximum (and minimum) values of a function f(x) by determining points at 
which the derivative df/dx is zero. However, this approach is often inappropriate — it 
may be that the function has a very complicated formula, or indeed as in Figure 4.2 
the derivative may not exist at the maximum point. Alternatively, it may be the case 
that f(x) is the result of some process, so that we are able only to measure f(x) for 
certain values of x in the interval [a, b]. In all these situations the techniques of 
calculus are not useful. We therefore need to search for the maximum point x* in 
some efficient way. 


f(x) 


Figure 4.5 
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Suppose we select two points x, and x, in the interval [a, b] and find that 
F(x_) > f(x,), as shown in Figure 4.5. 

Because f(x) is assumed to be unimodal, it follows that its maximum value 
cannot occur in the interval [a, x, ], because there cannot be a value of f(x) in [a, x, ] 
which is bigger than f(x,). We can therefore disregard this part of the interval when 
computing further values of f(x), and restrict the next step in our search to the 
interval [x,, b]. Suppose that for a third point x, the situation is as shown in Figure 
4.6. By a similar argument it follows that the maximum cannot occur in the interval 
[, b], so we can now discard this also. The maximum value of f(x) must therefore 
occur in the interval [x,, x,] (i.e. x, <x< x,). We can continue this procedure to find 
successively smaller intervals within which x* is located, ending with a final interval 
of uncertainty. We would like to make this final interval as small as possible using a 
fixed number N of points at which f(x) is evaluated. Surprisingly, it turns out that 
the best way of doing this involves the Fibonacci numbers introduced in Example 
1.4, Chapter 1. 

A simple-minded approach is to space out the N points at equal distances along 
the interval. The interval of length c= b—a is divided up into N +1 equal pieces, 
and f(x) evaluated at each point. The case N=3 is shown in Figure 4.7. We select 
the point at which f(x) is largest. In Figure 4.7 this is x,, so x* lies in [x,, b] which 
has length 2c/4. In general the final interval of uncertainty has length 2c/(N + 1). 

The Fibonacci process does much better than this. In Chapter 1 we denoted the 
Fibonacci numbers by 


fo=l, fi=l, f=2, fA=3, fa=5, f5=8,... 


where each number in the sequence is the sum of the previous two. When N points 
are used it turns out that the length of the final interval of uncertainty is 2c/fy,,. For 
example, with N =6 this is 


2c/f, =2c/21 
compared with 2c/7 if equidistant spacing is used, and the improvement increases 
rapidly as N is increased. 


f(x) 


SSB 


° 

Hf 
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1 

KZA —s 
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a 


Figure 4.6 
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f(x) 


een ti Hho n >x 
a x x, x b 
Figure 4.7 


The procedure for placing the N points within the interval is as follows: 


Fibonacci Search Algorithm 


Step 1 Evaluate f(x) at two test points x, and x, located at distances cfy/fy,, from 
each end of the initial interval. 


Step 2 According to which of the two values of f(x) is the larger, select the new 
interval within which x* must lie. 


Step 3 Insert the next point x, symmetrically in this new interval with respect to 
the point already inside it, and evaluate f(x). 


Step 4 Repeat Steps 2 and 3 until all N points have been inserted. x* lies within an 
interval of length 2c/fy,,. 


You might be a little puzzled as to what is meant by ‘symmetrically’ in Step 3. 
We say two points P,P, are symmetrically located in an interval AB if the distance 
of P, from the left-hand end is the same as the distance of P, from the right-hand 
end, as shown in Figure 4.8. It is in fact the symmetric locating of the points in Steps 
1 and 3 which makes the method work. 

It’s worth noting how to modify the search procedure if f(x) is to be minimized 
instead of maximized. It is assumed that f(x) has a single minimum point and is unimodal. 
Step 1 is unaltered; in the subsequent steps the new interval is selected according to which 
of the values of f(x) being compared is the smaller. For example, if we are looking for a 
minimum of f(x) in Figure 4.5 then the new interval would be [a, x, ]. 


=f dies if me 
eee 
A P, P, B 


Figure 4.8 


182 All the Best 
™@ EXAMPLE 4.2 


Let's illustrate the procedure for N=4. Suppose that a function f(x) is to be 
maximized on the interval 0< x<1, so that c=1. From Step 1 the two initial 
points are placed at a distance f,/f,=5/8=0.625 from each end, as shown in 
Figure 4.9, where x, is 0.625 from the left-hand end and x, is 0.625 from the 
right-hand end, so x, =0.625 and x, =0.375. 


Figure 4.9 


These two points are symmetrically located in the interval [0,1]. Suppose 
f(x,)> f(x,), so in Step 2 we delete the interval [0, x,] since we are assuming 
that f(x) is unimodal. In Step 3 the next point x, is placed 0.375 from the left- 
hand end of the new interval [x,, 1], so that it is symmetrically located relative 
to the interior point x,, as shown in Figure 4.10. 


f(x) 


£03) 


% 
= 0.375 = 0.625 =0.75 


Figure 4.10 


If we suppose f(x,) < f(x,) then the next new interval is [x,, x,]. The final point 
X, is located at a distance 


Xy— % = 0.75 - 0.625 = 0.125 


from the left-hand end, as shown in Figure 4.11. 
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f(%) }---------- x 
PACT te x 
1 = Se 
% Xs x x 
=0.375 =0.5 = 0.625 =0.75 
Figure 4.11 


lf we suppose f(x,)> f(x,) then we drop the interval [x,, x,]. The final 
interval of uncertainty is therefore [x,, x,]. 

The following table shows the end points of the successive intervals within 
which x* lies: 


End points Length of interval 
Initial interval 0 1 1 
After first step 0.375 1 0.625 
Second step 0.375 0.75 0.375 
Third step 0.375 0.625 0.25 


You can see that the final interval of uncertainty has length 0.625 - 0.375 = 0.25, 
which agrees with the theoretical value of 2c/f, = 2/8. Notice also that, reading 
downwards in the last column, the length of each interval is the sum of the two 
below, that is 


1= 0.625 + 0.375 
0.625 = 0.375 + 0.25 


This rule is exactly the same as that used to generate Fibonacci numbers, and 
this is essentially the reason why they arise in this context. 


EXAMPLE 4.3 


Given the function 
f(x) = x(1.5- x), O<x<1 


we apply Fibonacci search with five evaluations of f(x) (i.e. N=5) to estimate 
the value of x* which maximizes f(x). The length of the final interval of 
uncertainty within which x* lies will be 


2¢/ fy = 2/13 = 0.1538 
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We begin at Step 1 by taking 


&_8 
a ad 0.6154, x, = 1-0.6154 = 0.3846 


and compute 
f(x,)=0.544, f(x,) =0.429 


as shown in Figure 4.12. 


ACR) ree Se ees ea Se Pas x 
HAC ZY alc heeadate se x 
sie ——- =; 
0 x, x 1 
=0.3846 =0.6154 
Figure 4.12 


In Step 2 we therefore select the new interval [x,, 1] and in Step 3 we take (see 
Figure 4.13) 


Xq = 1- 0.2308 = 0.7692 


fo) 
x 
x 
0.2308 —> <— 0.2308 —> 
aie L 1 wee 
x x xy J 
= 0.3846 = 0.7692 


Figure 4.13 


which produces f( x,) = 0.562 > f(x,). We next select the interval [x,, 1] and take 
X, = 0.8462 (see Figure 4.14) giving f(x,) =0.553 < f( x3). 
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f(x) 
x 
x 
0.1538—> —\ 0.1538 —> 
1 1 —_1—__+ 
x % M 1 
=0.6154 =0,8462 
Figure 4.14 


The next new interval is [x,, x,] and we take x; = 0.6924 (see Figure 4.15) with 
f( x,) = 0.559 < f(x). 


f(x) 
x 
x 
0,077 —> —— 01) — 
|| 1 pos 
x Xs xy x 
= 0.6924 
Figure 4.15 


The final interval of uncertainty is therefore [x;, x,], that is 
0.6924 < x< 0.8462, which has length 0.1538, agreeing with the theoretical value. 
The midpoint of this interval is 0.7693, so we can say that x’ = 0.7693 + 0.0769. 
Again it is instructive to list the successive intervals of uncertainty with their 


lengths: 
End points Length of interval 
Initial interval 0 J 1 
[x%, 1] 0.3846 #1 0.6154 
Lx, 1] 0.6154 1 0.3846 
by, 4] 0.6154 0.8462 0.2308 


Lx, Xi] 0.6924 0.8462 0.1538 


186 All the Best 


As in the previous example you can see that the interval lengths obey the 
Fibonacci summation rule: 

1=0,6154 + 0.3846 

0.6154 = 0.3846 + 0.2308 

0.3846 = 0.2308 + 0.1538 


Since the derivative of f(x) is 


df 

—=1.5-2 

dx = 
and df/dx=0 at the maximum point, the exact value of x* is 0.75. This does lie 
within the interval of uncertainty which we found, but is not the midpoint. 


EXERCISE 4.3 Repeat Example 4.3 using N= 6. 


EXERCISE 4.4 If the maximum value of a unimodal function f(x) defined on 1 <x<5 is 
to be located within an interval of uncertainty of length not more than 0.05, how 
many function evaluations will be needed using Fibonacci search? 


EXERCISE 4.5 Use Fibonacci search with seven function evaluations to estimate the value 
of x which minimizes 


fiyext-4x45, 1<x<4 


EXERCISE 4.6 Estimate the dimensions of the rectangle of maximum area in Exercise 4.1 
using Fibonacci search with eight function evaluations. 


4.2 LINEAR PROGRAMMING 


@ EXAMPLE 4.4 


The manager of a small convenience store decides to stock two brands of ice 
cream, A and B, but believes that likely sales will only justify ordering a total of 
at most 25 cartons. The space available in the freezer for ice cream is at most 36 
cubic units. Because of different packaging, a carton of brand A occupies 1 
cubic unit, but brand B takes up 2 cubic units per carton. The profit of brand A is 
60 pence per carton and on B 90 pence per carton. How much of each brand 
should be stocked so as to make the maximum profit? 

Let x,, x, denote the numbers of cartons of brands A and B respectively 
which are stocked. The constraint on the total number of cartons is 


x, +X) <25 (4.1) 


The volume occupied by the cartons is x,+2x, so the constraint on available 
space is 


X,+2x,<36 (4.2) 
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Notice that because of the way they are defined both x, and x, are non- 
negative, that is 


x,20, x,20 (4.3) 
We wish to find the values of x, and x, which maximize the profit, which is (in 
pence) 

z=60x, + 90x, (4.4) 


This is a typical example of what is called a linear programming (LP) problem. 
This name was invented in 1951 when ‘programming’ was a fashionable new 
scientific term, much like ‘fractals’ or ‘chaos’ today. Computer programming 
was a very new discipline, and terms like ‘mathematical programming’, 
‘quadratic programming’ and ‘dynamic programming’ were coined to 
emphasize their novelty, and also their need for computers to perform 
multitudinous computations. The description linear refers to the fact that the 
expressions on the left-hand sides of the inequalities in (4.1) and (4.2) are linear 
combinations of the variables x, and x,, and so is the profit function z in (4.4). 
We pointed out in Section 3.3 in Chapter 3 that a linear combination of 
quantities x,, x;, x3, ... simply means the expression 


Ay X, + 8) X_ + AgXyto 


where 4;, 4, 4,... are constants, not all zero. In this example, however, we can 
also show how ‘linear’ relates to the geometrical meaning of straight line. Let's 
look at the inequality (4.1): when equality holds we have 


xX +X =25 (4.5) 


which is the straight line AB in Figure 4.16(a), remembering that we are 
restricted to the quadrant x, >0, x, 20. 


Ot) 


x +x, =25 


(25, 0) 


(a) 


Figure 4.16 


Consider next the equation 


X, + X= 24 (< 25) 
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This equation corresponds to the dashed line in Figure 4.16(a). Similarly, all 
equations of the form 


X,+%,=k 


with k<25 are represented by parallel lines inside the triangle 0AB. Remember- 
ing that we cannot have x, or x, negative, we have therefore found that the 
inequality (4.1) corresponds to the hatched region OAB in Figure 4.16(b). Using 
an exactly similar argument, the inequality (4.2) together with (4.3) corresponds 
to the hatched region OCD shown in Figure 4.17. 


be) 


Figure 4.17 


By the way, it’s easy to find the coordinates of the vertices A, B, C, D. Simply 
set x, =0 in (4.5) to get A=(25, 0), and similarly x, =0 gives B= (0,25); Cand D 
are found from x, + 2x, =36 in the same way. The overall region which satisfies 
all three of (4.1), (4.2) and (4.3) is obtained by putting together the regions 0AB, 
OCD as shown by the hatched region in Figure 4.18. 


E=(14, 11) 


x 


Figure 4.18 


Linear Programming X 189 


The point E is where the two lines AB, CD intersect, and its coordinates are 
found by solving the simultaneous equations 


X,+X,=25, x, +2x,=36 


to give E=(14,11). The region OAED is called the feasible region since it 
contains all the points which satisfy the constraints (4.1), (4.2) and (4.3). Any 
such point is called a feasible solution to the LP problem. 

The profit function z in (4.4) can also be represented in terms of straight 
line graphs. For example, if z= 180 we have the straight line 


60x, + 90x, = 180 


which is shown in Figure 4.19 joining the points (0, 2) and (3, 0). As the value of 
z increases we obtain a series of parallel lines which move away from the 
origin: the case z= 360 is also shown in Figure 4.19. 


%y 


(0, 4) 


(0, 2) 


z increasing 


(3, 0) (6, 0) 
Figure 4.19 


z increasing 


> X, 


Figure 4.20 
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Our LP problem can now be stated in geometrical terms: what is the furthest 
from the origin that the ‘z-line’ can be whilst still remaining in the feasible 
region? The answer is shown in Figure 4.20: the maximum value of zis attained 
when the dashed line passes through the vertex E, which has coordinates 
X,=14, x,=11. That is, stocking 14 cartons of brand A and 11 of brand B 
produces the maximum possible profit 


z=60x 14+90 x 11 = 1830 pence 


You may be thinking that it’s fortunate that the solution came out in integers — 
how could the store stock a fractional number of cartons? This difficulty will be 
discussed later, in the next section. 


EXERCISE 4.7 A tailor has 80 m* of polyester material and 120 m? of wool material 
available each week. A suit requires 1 m? of polyester and 3 m? of wool, whereas a 
dress requires 2 m? of each, Let x, and x, be the numbers of suits and dresses which 
the tailor makes per week. The -tailor sells each garment for £50, and wishes to 
maximize the weekly income. Express this as an LP problem. 


EXERCISE 4.8 Sketch the feasible region for the problem in Exercise 4.7. 


EXERCISE 4.9 Using your diagram in the preceding problem, find the solution for 
Exercise 4.7. 


EXERCISE 4.10 If in Exercise 4.7 the prices change to £80 for a suit and £40 for a dress, 
determine the new solution. 


@ EXAMPLE 4.5 
Let’s now tackle the problem in Example 4.4 using an algebraic approach. The 
total number of cartons stocked is x, + x,, so the number of unfulfilled sales is 
s,=25-x,-% (4.6) 


and s, is called a slack variable. Clearly in view of this definition and the 
constraint (4.1) we have s, >0. Similarly from (4.2) the unused storage space is 


S, = 36 - x,-2x, (4.7) 


with s,20. Hence the constraints (4.1) and (4.2) can be replaced by the 
equations (4.6) and (4.7), which can be rearranged to give 


X, +X, + $,=25 (4.8) 
X,+2x, + S,=36 


with the condition that a// the variables x,, x,, s, and s, must be non-negative 
(20). 
Subtracting the first equation in (4.8) from the second gives 


X,=11+8,- 8 (4.9) 
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From the first equation in (4.8) we then get 


X,=25-x,- Ss, 


=14-2s,+ 5, using (4.9) (4.10) 
Substituting the expressions (4.9) and (4.10) into the profit function in (4.4) 
gives 

z=60x, +90x, 
= 60(14-2s, + s,)+90(11+ s,- s,) 
= 1830 - 30s, -30s, (4.11) 


Since s, and s, cannot be negative, the maximum value of z in (4.11) must 
occur when s,=0, s,=0, in which case z= 1830. From (4.9) and (4.10) we then 
have x,=14, x,=11. This agrees with the result found graphically (see Figure 
4,20). 


Sometimes an inequality constraint arises the other way round, for example we 
might have 


x, +3x, 29 

In this case we define the corresponding slack variable by 
S=X,+3x,-9 

so that s >0, and the inequality is replaced by the equation 
x,+3x,-s=9 


The crucial thing is to keep all the variables non-negative. 


EXERCISE 4.11 Solve algebraically the LP problem: 
Maximize z=2x,+3x, 
Subject to: 2x, +4x,<13 
3x, +2x,<11 
x,20, x70 


EXERCISE 4.12 Solve algebraically the LP problem: 
Minimize z=3x,+2x, 
Subject to: x, +3x,29 
2x, +x, 214 
x,20, x20 


When there are more than two variables the graphical technique cannot be used. 
However, a powerful approach called the simplex method can be applied (using 
appropriate software) to solve LP problems with hundreds of variables and 
constraints. The basic ideas of the method are illustrated in the following example. 
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™@ EXAMPLE 4.6 


Consider the LP problem: 


Maximize z=5x,- % (4.12) 
Subjectto: x,+x,<5 
74 3x,<11 
X,+3X, (4.13) 
-X,+2x,<2 


x,20, x20 

We first write the inequalities (4.13) as equations by introducing slack variables 
s,>0, s,>0, s,20, so that 

X, +X + 8, =5 

2X, +3xX, + S,=11 (4.14) 

-X,+2x%,+ $,=2 
You must remember that throughout none of the variables can be negative. 
Since there are three equations in (4.14) we select three of the variables and 


solve for them in terms of the other two variables. A basic solution is one 
where three variables are positive and the other two are zero. 


(i) Choose x,, x,, s,, which gives 


iG, 202 3A 
eA fay tes) 
2 
y= 8-3 -— (4.15) 
3 
Symp oto 
7 ya os 


Substituting into (4.12) produces 


za SO ea Bal (4.16) 
vi 7 vA 
A basic solution to the problem is obtained by taking s,=0, s,=0 in 
(4.15), which produces x,=16/7, x,=15/7 and s,=4/7. In this case 
z=65/7. However, we can see from (4.16) that this solution is not 
optimal, since by making s, positive we can increase the value of z. 
Keeping s, =0, (4.15) gives us 


=F (18 +355), 4=4 (15 -25,), s1= 4 (4-84) (4.17) 


Since x,, x, and s, in (4.17) are not allowed to be negative, you can 
see that the /argest value of s, which we can take is s, = 4. In this case 
s, =0, which means that in the trio {x,, x;, s,} of positive variables s, is 
now replaced by s,: 
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(ii) The selected variables are now x,, x, and s,. Re-solve the original 
equations (4.14) to obtain 


x, =4-3s,+ Ss, 
xX,=1+2s,-5, (4.18) 
$,=4-7s,+3s, 

Substituting these expressions into the profit function (4.12) gives 
z=19-17s,+6s, (4.19) 


Hence a second basic solution is obtained by taking s,=0, s,=0, 
which gives x, =4, x,=1, $,;=4 and z= 19. Notice that this value of zis 
larger than before. Again we see that z in (4.19) can be made yet 
larger, this time by increasing the value of s,. Looking at the expres- 
sions (4.18) you can see that with s, =0, the largest possible value of 
s, is s,=1, in which case x, =0. We therefore replace x, by s, in the 
trio of positive variables, giving: 


(iii) The selected variables are now x,, S,, $3. The solution of (4.14) is 
X,=5-%- 8, 
S,=1-X,+28, 
$,=7-3x,- $s, 
and 
z=25-6x,-5s, 
You can see from this last expression that z will achieve its maximum 
value, equal to 25, when x,=0, s,=0. The values of the other 


variables are then x, =5, s,=1, S;=7 and we have found the optimal 
solution. 


It’s instructive to look at a graphical interpretation of the algebraic work we’ve 
just done in steps (i), (ii) and (iii). The feasible region described by the inequalities 
(4.13) is the hatched area OABCD in Figure 4.21. This diagram is built up in the 
same way as Figure 4.18 in Example 4.4. The profit function z in (4.12) is 
represented by the dashed line in Figure 4.21, and is obtained by the same arguments 
used in Figure 4.19. For example, when z= 10 the line represented by (4.12) passes 
through F = (2,0) and G= (0, —10). When z= 20 the line is further away from the 
origin in the direction of the arrow in Figure 4.21. 

For step (i) the basic solution is s, =0, s;=0 which is the point B in Figure 4.21; 
at the next step we move to s,=0, s,=0 which is the point C; finally, we move to 
5, =0, x, =0 which is the point D. 

As shown in Figure 4.22, each basic solution corresponds to a corner of the 
feasible region. The key idea of the simplex method is that we move from one 
corner to another in such a way that the profit function z is increased, as shown by 
the dashed parallel lines in Figure 4.22. This continues until it is not possible to 
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Figure 4.22 


increase z any further — in this example, we cannot go beyond the point D without 
going outside the feasible region. 


EXERCISE 4.13 Solve the following LP problems graphically, and by using the simplex 


procedure. 
Subject to: x, +x, <3 
xX, — 2x, <1 
—2x, +x, <2 
x,20, x20 


(a) maximize z= x, — x, 
(b) minimize z= x, —x,. 


Transportation Models 195 


In practice the arithmetical operations involved in carrying out the simplex 
method can be made purely mechanical by expressing them in a tabular form. 
Details of this are rather tedious and can be found in textbooks listed at the end of 
the chapter. To solve a practical problem you would use standard available 
software. 


4.3 TRANSPORTATION MODELS 


A special form of LP problem involves the transportation of goods from a set of 
depots to a set of destinations, the aim being to do this as cheaply as possible. 


@ EXAMPLE 4.7 


Suppose there are four warehouses, which are to supply three supermarkets 
with certain goods. Each supermarket requires 5 units, but the amounts 
available at the warehouses are respectively 1, 6, 2, 6 units: notice that the 
total supply does equal the total demand (15 units). The cost of transporting 
one item from a warehouse to a supermarket is shown in the following table: 


Warehouses 
(1) (2) (3) (4) 


Supermarkets (4.20) 


The number c, in row i, column j in (4.20) is the cost of transporting 1 unit to 
supermarket (/) from warehouse (j). For example, the unit transportation cost 
to supermarket (2) from warehouse (4) is c,,=7. Let's denote by x, the number 
of units which come to supermarket j from warehouse j, as shown in (4.21). 


Availabilities 


5 x Xa Xg Xa 
Requirements 5 ay | Xo | Xa | Xo | (4.21) 
5 X3y | X32 X33 | Xu 


For the first supermarket, the incoming total is required to be 5 units, so adding 
up the first row gives 


yy t Xa + My t+ X= 5 (4.22) 
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An exactly similar operation applies for each of the other two supermarkets, so 
adding the elements in the other two rows gives 


Xqy + Xoq + Xp3 + X= 5 (4.23) 
Xqy + Xqq + Xyq + Xyq = 5 : 


Considering the goods which are despatched, the amounts going out from the 
first depot are given in the first column, so adding these gives 


Xt Xp, + X= (4.24) 
Similarly for the other three depots we must have 
Xq + Xqq + Xqq = 6 
X13 + Xg + Xq = 2 (4.25) 
Xia t Xo + Xqq = 6 
Notice that the constraints this time are equations instead of inequalities, but 
obviously we must still have all x, > 0 — we can’t transport a negative number of 
items! To find the total cost z which is to be minimized, we multiply each unit 
cost given in (4.20) by the corresponding amount transported in (4.21), to end 
up with 
Z= 5X1 + 4X2 + 3H y+ 2X yy 
+ 10 Xp) + BXp2 + 4X qq + 7X4 
+ 9X5, + 9 Xq7 + BXyq + 4X54 (4.26) 
You can see that the transportation problem is a special kind of LP 
problem: we wish to minimize the linear cost function (4.26) subject to the 
linear constraints (4.22), (4.23), (4.24) and (4.25). In fact, one of these seven 
equations is redundant, because of the condition that the total of units 
available is equal to the total number required. This means that the 
expression obtained by adding together the three equations (4.22) and (4.23) 


is identical to that obtained by adding together the four equations (4.24) and 
(4.25). 


However, because of its special form we don’t use the simplex method to solve 
transportation problems, but instead develop a special procedure. First, to find a 
feasible solution (i.e. one which satisfies all the constraints) we apply the northwest 
corner method, so called because we start in the ‘northwest corner’ in (4.21) with the 
cell containing x,,. We wish to fill in the cells in the array (4.27). 


Availabilities 
ih 6 2 6 
5 
Requirements 5 (4.27) 
a} 
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This is to be done in such a way that the row sums and column sums add up to the 
correct amounts. It’s clear that the most we can put into the (1, 1) cell is 1, because 
of the 1 at the top of the first column; the supply from the first warehouse is 
therefore exhausted, so no more entries can go into the first column. However, 
supermarket (1) still requires 4 units, so we move along the first row to the next cell 
(ie. the northwest corner of the remaining array), and see that we can supply all 4 
units from warehouse (2). At this stage we therefore have the array in (4.28): 


(4.28) 


The slashed-through numbers indicate that these are finished with; the numbers 
within the circles indicate the sequence in which we have filled the cells. We now put 
the maximum possible amount into the northwest corner of the array which is left; 
clearly this is 2 in the (2, 2) cell. We need a total of 5 units in the second row, but 
can only put 2 into the (2,3) cell because only 2 units remain available for the 
second column. To complete this second row, we therefore need 1 unit in the (2, 4) 
cell, giving the array in (4,29): 


(4.29) 


The only way to complete the solution is to put 5 units into the (3, 4) cell, giving the 
feasible solution shown in (4.30). 


(4.30) 


The cost of this solution is obtained from (4.26) as 
z=5xX14+4x44+8x24+4xK24+7x14+4x5 
=72 


A glance at the table of costs in (4.20) reveals that the two cheapest routes 
(namely, supermarket (1) from warehouses (3) and (4)) are not used in the solution 


198 All the Best 


(4.30). It therefore seems sensible to try a scheme which starts with the cheapest 
route, assigns as much as possible to this, then uses the next cheapest route 
available, and so on. The result of this procedure is shown in the array (4.31), 
where again the numbers within circles indicate the sequence in which the cells are 
filled. However, using (4.26) you can easily check that the cost of this solution is 
z=82, which is dearer than before! The ‘common sense’ approach to finding a 
solution by concentrating on the cheapest routes therefore won’t work in general. 


1 6 2 6 


s{o[olo|so 
5 | 0 | 36) tis (4.31) 


5 


Clearly, a more systematic method is needed to find the best solution. Begin with 
the first solution displayed in (4.30), and define shadow costs u,, Up, Uz, Vy, Vrs V3, 
v4. These variables are chosen so that the cost c, in (4.20) for each cell (i, j) in 
(4.30) which contains a non-zero entry satisfies 


Cy = Ut uy (4.32) 


You can think of the v as ‘dispatch’ costs, and the u as ‘reception’ costs, so (4.32) 
says that each cost for an occupied cell is the sum of a dispatch cost and a reception 
cost. For the solution in (4.30) the situation is as follows: 


uy 
Reception at supermarkets Uy 


Uy 


Dispatch from warehouses 


By (4.32) the shadow costs must satisfy the equations 
u+v,=5, uy +v,=4 
Wt+,=8, uy. +0;=4, w+ uy=7 
u, + V4=4 
Since there are seven unknowns but only six equations the solution is not unique. We 


can arbitrarily set any one of the variables, say v,, equal to zero and solve for the 
rest. The solution in this case for the shadow costs is shown in (4.33). Notice that 


Transportation Models f 199 


(4.33) 


some can be negative. Next, form the array whose entry for each empty cell (i, j) in 
(4.33) is u; + vj — ¢;, Where the c, are the costs in (4.20). We obtain the table (4.34), 
where for example the (1, 3) entry is 


Uy + U3 — C3 =5-5-3=-3 


(4.34) 


The total cost will be reduced if we transport items via the route which 
corresponds to the positive entry in (4.34), namely from warehouse (4) to 
supermarket (1). This will replace one of the routes used in the original solution 
(4.30). We transfer as many as possible into this new cell (1, 4) without making any 
other entry negative. Inspection of (4.30) shows that we can put at most 1 unit into 
cell (1, 4), otherwise the entry in (2,4) becomes negative. Our improved solution is 
now 


1 6 2 6 
5 1 3 0 1 (4.35) 
5 0 a 2 0 
2 0 0 0 5 
and new shadow costs are shown in (4.36): 
uy 
Uy (4.36) 
U3 
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The table u; + v,;—c;; for empty cells in (4.36) is shown in (4.37). The entries in 


(4.37) 


are all negative, showing that the solution in (4.35) is optimal. The cost in (4.26) is 
now z=71, which is the smallest possible. 

An explanation of why the method works is as follows, At each step we calculate 
the entries in the table u; + v;— cy for cells which are empty in the current solution. If 
there are any positive entries we select the largest, and put as much as possible into the 
corresponding cell. Suppose a cell which is not currently utilized as a transportation 
route is cell (2, 1). If we enter 1 unit into this cell, then in order to balance we have the 
situation in (4.38), where the other three cells shown are occupied in the current 
solution — that is, cells (2, k), (j, 1) and (j, k) contain non-zero entries. 


row 2 Uy na] ves -l 
3 : oan : (4.38) 

row j 4 -1 rie 1 

VY; Uy 

column 1 column k 
The change in the cost due to the changes in the solution shown in (4.38) is 

Coy — Cop — Cy + Cy = Cay — (Ug + Vg) — (Uj + VY) + (Uj + U,) (4.39) 
= Cy — (Uy, + Y;) (4.40) 


where (4.39) follows from the definition (4.32) of the shadow costs for occupied 
cells. You can see from (4.40) that there will be an improvement (i.e. the cost is 
reduced) if c,, <u, + v,. Hence we use cell (2, 1) if u, + v, — C2, > 0. The procedure 
is repeated, selecting one new cell to be occupied at each iteration, until there are no 
positive entries in the table u; + v,; — cy. 

One point you should be aware of: for both LP and transportation problems, 
although the optimal value of the cost (or profit) function is unique, there may be 
more than one set of values of the variables which produces this — hence we look for 
an optimal solution, as it may not be unique. 


EXERCISE 4.14 Use the northwest corner method to find a feasible solution of the 
transportation problem with the following requirements and availabilities: 


Availabilities 
15) °25: =] 
5 
Requirements 15 


10 
15 
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EXERCISE 4.15 A transportation problem with the cost table 


has the feasible solution 


Availabilities 
3 Ps 
4] 0 4 
Requirements 2 1 1 
Rile2 0 


Use the shadow costs method to obtain an optimal solution, and show that the 
minimum cost is 42. 


EXERCISE 4.16 Consider a transportation problem with availabilities, requirements and 
costs as shown in the following table. 


Availabilities 
4 6 10 


3[ 3 | 4 


2 
Requirements 5| 5 ih 6 
12 4 5 


Obtain a feasible solution using the northwest corner method. Use the shadow costs 
method to obtain an optimal solution, and show that the minimum cost is 85. 


You can see that the solution method developed for the transportation problem 
guarantees that the variables always have integer values, assuming the availabilities 
and requirements are integers. This is fortunate, since it would not make sense to 
transport fractional quantities of items. However, for general LP problems there is 
no such guarantee that the simplex method will produce a solution in integers, even 
if this is required for the solution to be valid. You were correct if you thought that 
the way the solution to Example 4.4 came out in integers was highly fortuitous. 
Another example where the answer luckily comes out in integers is provided by 
Exercises 4.7, 4.8 and 4.9, in which a tailor wishes to find the most profitable scheme 
for making suits and dresses. Problem 4.4 involving an allocation of buses provides 
yet a further example where the solution fortunately works out in integers. Finding 
optimal solutions in integer form for general LP problems is a subject called integer 
programming, which is beyond the scope of this book. Instead we end this section 
with some examples of special LP problems where integer solutions can be found 
relatively easily. 
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@ EXAMPLE 4.8 


The ‘assignment problem’ is a special case of the transportation problem. The 
basic idea is that there are n ‘candidates’ to be allocated to n ‘jobs’, and this is 
to be done so that the overall ‘cost’ is optimized. 


(a) 


(b) 


(c) 


Four people P,, P,, P;, P, are to be assigned to each carry out one of four 
tasks T,, T;, Tz, T,. The ‘fitness’ of person P; for job T; is evaluated by some 
assessment procedure to have a value c,, with c,=0 if the person cannot 
do this task. Let x,=1 if P; is assigned to T;, and x,;=0 otherwise. For 
example, the number of jobs done by P, is x; + X.+ Xig+X4, and since 
each person does only one job we can write 


Xyy + Xi + Mg + X= 
The same argument holds for P,, P; and P,, so we have 

Xn + Xt Xgt+Xe=1, (= 1,2,3,4 (4.41) 
Similarly, the number of people doing job T, is 

Xj Xt Xajpt+ %p=1, f= 1,2,3,4 (4.42) 


It is required to maximize the overall effectiveness of the assignment, which 
is 


Os OyXij = Cy X44 + CyaX qq +++ + CygXqq + CoaXag (4.43) 
oy 


i 


subject to the constraints (4.41) and (4.42). 

A university wants to advertise four of its degree programmes in four 
different periodicals. The cost varies according to the size of the advertise- 
ment and the publication, and is c, for advertisement i in periodical j. Set 
x;=1 if course i is advertised in periodical j, and x,=0 otherwise. The 
overall cost is again (4.43), and in this case is to be minimized. Assuming 
that the university can afford to place just one advertisement for each 
course, the constraints are again (4.41) and (4.42). 

Four firms of building contractors are asked to give quotations for the costs 
of carrying out four jobs. However, each firm only has the resources to 
undertake one of the contracts. Let c, be the quotation from contractor / for 
job j and let x,=1 if contractor i does job j, and x,=0 otherwise. The 
problem is again to minimize the overall cost (4.43) subject to the con- 
straints (4.41) and (4.42). 


M@ EXAMPLE 4.9 


Five towns are linked by motorways as shown in Figure 4.23. 


A supermarket chain has a store in each of the towns. The chain decides to 


supply each store from a depot which is either within the same town, or in a 
town with which there is a direct motorway link. They want to have as few 
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T; 
T, 
T; 
Ts 
T, 
Figure 4.23 


depots as possible. Let x,=1 if town T, contains a depot, and x,=0 if it does 
not. From Figure 4.23 you can see that town T, has direct links with towns T,, T3 
and T,. Hence the number of depots to which T, has access is 


X +X + Xq + Xs 
which includes the possibility that there is a depot in T, itself. The requirement 
that T, must be served by at least one depot means that 
Xy + X_ + Xy t+ X71 
Similarly for the other towns the constraints are 
Th! Xt Xp + %yP 
Tat Xp + Xt Xp t My + Xp 27 
Tal Xgt Xt X71 
Ts Xy + Xq + Xp t+ Xe 7 
We have to minimize the total number of depots 
Xy + X_ + Xa t Xq Ms 


In fact, the answer is that there need be only two depots - one solution is to 
have depots in T, and T,. Can you spot any other solutions? 


EXERCISE 4.17 An athletics competition consists of four track events: 100 m, 400 m, 
800 m, 1500 m. The rules are that each runner from a club can enter only one event, 
and the team with the smallest total time for the four events is the winner. The trial 
times in seconds of four athletes in a club are as follows: 


Event 
100m 400m 800 m 1500 m 
A 13 64 146 385 
Athlete B 12 62 147 365 
Cc 14 63 150 360 
D 14 61 145 370 


Express in assignment form the problem of selecting who runs in each event so as to 
produce the best result for the club. 
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NETWORKS AND GRAPHS 


EXAMPLE 4.10 


A common feature of many road atlases or airline route maps is a diagram- 
matic representation of distances between major cities, as illustrated in Figure 
4.24. This is an example of a network, which is a set of points called vertices (or 
nodes) and a collection of lines called arcs (or edges) joining some or all of 
these points in pairs. Each arc has a measure (or ‘weight') attached to it — in this 
example, the distance in km between the two vertices. Incidentally, the drawing 
need not be to scale. If there are no numbers associated with the arcs then the 
network is given the technical name graph (e.g. see Figure 4.23). This is a 
completely different usage of the term from what you are familiar with as the 
graph of a function. 


Newcastle 


York 


Manchester Bradford 


distances in km 


Sheffield 
Figure 4.24 


EXAMPLE 4.11 


Graphs are often useful in giving a pictorial representation of a situation. Let's 
look at a simple example of an assignment problem like those in Section 4.3, 
but without any optimization involved. Suppose there are five teachers (T, to 
T,) to be timetabled to take five classes in Mathematics, Science, Economics, 
History and French. The capabilities of the teachers are 


Teacher Can teach 
Ty Mathematics and Science 
T, Mathematics and French 
Ts Mathematics and Economics 
T, History and Economics 
Ts French and History. 


The graphical representation of this information is shown in Figure 4.25. An arc 
connects a teacher to a subject if it is one they can teach. The graph in Figure 4.25 
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T, Mathematics 

T, Science 

T; Economics 

Ts History 

Ts French 
Figure 4.25 


is called bipartite, since there are two collections (or ‘subsets’) of vertices (the 
teachers and the subjects) which have no arcs connecting vertices in the same 
subset (there are no arcs connecting teachers, and no arcs connecting subjects). 
An assignment problem is to find a set of arcs which connects just one teacher to 
each subject. 


EXERCISE 4.18 Find two satisfactory assignments in Example 4.11. 


In a road network like Figure 4.24 it’s usually the case that the arcs can be 
travelled in either direction. However, you can imagine a city-centre plan where 
many of the streets are one way to traffic. If the arcs have directional arrows on them 
then the graph is called directed, but in this section we'll only consider undirected 
graphs where there are no arrows on any of the arcs. A convenient notation is to use 
(i,j) to denote an arc connecting vertex i to vertex j. For example, consider the 
graph in Figure 4.23. The arc connecting vertices T, and T, is (1, 2), and the entire 
graph can be described by the set of arcs 


{(1, 2), (1, 3), (1, 5), (2, 3), (3, 4), (3, 5), (4, 5)} 


@ EXAMPLE 4.12 


(a) Looking again at Figure 4.24, you can see that there may be several 
different routes joining two cities. For example, to go from Manchester to 
Sheffield we could use any of the following routes: 


R,: Manchester—Bradford—Newcastle—York-Sheffield 
R,: Manchester-Bradford—Newcastle-York—Bradford—Sheffield 
R,: Manchester-Bradford-York—Bradford-Sheffield 


The route R, is an example of a path, where no vertex is visited more than 
once. The routes R, and R, are not paths, since in each case Bradford is 
passed through twice. They are examples of what is called a walk, which is 
simply any set of arcs connecting one vertex to another. Notice that 
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‘retracing of steps’ is allowed, as in route R,, where the arc Bradford—York 
is traversed in each direction. 


(b) 


Figure 4.26 


In Figure 4.26 the vertices are labelled simply with integers. 
(i) The set of arcs 
{(1, 2), (2, 4), (4, 5), (5, 6)} 
is a path from vertex 1 to vertex 6. 
(ii) The set of arcs 
{(1, 3), (3, 5), (5, 4), (4, 


is a cycle, which is a path which starts and finishes at the same vertex 
(here vertex 1). 


(iii) The set of arcs 
{(1, 2), (2, 4), (4, 2), (2, 3)} 
is a walk from vertex 1 to vertex 3. 


EXERCISE 4.19 


Figure 4.27 


What is the nature of the following graphs in Figure 4.27? 


(a) {(1,3), 3,2), 2, 1)} 
(b) {(1, 2), (2,3), G, 1), 1,4} 
(c) {G, 6), (6,7), (7,4), (4, 1), A, 2)} 
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For a road network like that in Figure 4.24 we’re often interested in finding the 
shortest path between two given cities. It’s obvious by looking at Figure 4.24 that 
none of the routes R,, R, or R; is any good in this respect — the shortest distance 
between Manchester and Sheffield is clearly 60 km. What we need for a general 
network is a systematic way of finding a shortest path between two vertices s 
(= start) and f (= finish). This is provided by Dijkstra’ s method. 

We assign a ‘label’ L(i) to every vertex i in the network, equal to the distance to 
that vertex from the starting vertex s along the shortest path found so far. The label 
may be permanent (P), in which case the problem is solved for that vertex; otherwise 
the label is temporary (T), in which case there is uncertainty as to whether the path 
to this vertex from s is the shortest possible. 

We begin with a set of vertices all holding temporary labels denoted by ‘e’, 
which represents a very large number, much larger than any of the numbers assigned 
to the arcs of the network. 

At each step we reduce by one the number of vertices with temporary labels. 
This is done by finding paths to these vertices using the shortest path to vertices with 
permanent labels, followed by an arc from such a vertex. The vertex with the 
smallest temporary label is then made permanent. The procedure is successively 
repeated, making one new label permanent each time, until the final vertex f receives 
a permanent label. 


Dijkstra’s Algorithm 


Step 1 Set L(s)=0, L(i)= fori # s. 
Let p =last vertex to be given a permanent label. 


Set p=s. 
Step 2 For each vertex i with a temporary label, compute its new label using the 
formula 
Li) =min{L@,L@) + d(p, i)) (4.44) 
new label old labels 


In (4.44), d(p, i) is the length of the arc (p, i), and the labels within the 
square brackets are the values at the previous iteration. 

Find the vertex k with the smallest new temporary label. 

Set p= k and make L(p) permanent. 


Step 3 If vertex f has a temporary label, repeat Step 2. Otherwise, the solution has 
been obtained. 


Actually using the algorithm is not as complicated as it seems in the formal 
description! 
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Mi EXAMPLE 4.13 


Figure 4.28 


Let's find a shortest path between the vertices s and f for the network in Figure 
4.28. The algorithm proceeds as follows: 


Step 1 This assigns the initial labels 


Vertex s 1 2 3 a fF 


Label L(i) 0 «2 cc c oc 
Status Lid T iv il 10 ap 


and we have p=s. 
Step 2 We now redefine the labels L(i) using (4.44). With j= 1 we get 
L(1)=min{L(1), L(s)+ d(s, 1)] 
= minf[ce,0+31]=31 
Notice that we get the values of L(1) and L(s) inside the square 


brackets from the table in Step 1. The value of d(s,1) comes from 
Figure 4.28. We repeat this process for j= 2,3, 4, fas follows: 


L(2) = min{L(2), L(s)+ d(s, 2)] 
= min[ce, 0+ 57] =57 
L(3) = min{L(3), L(s) + d(s,3)] 
= min[ee, 0 +00] =00 
Notice that we denote d(s, 3) by ~, since there is no arc joining vertices 
sand 3. 
L(4) = min{L(4), L(s)+ d(s, 4)] 
= min[ece, 0+ 106] = 106 
L( f)=min{[L(F), L(s)+ ds, f)) 
= min[ee, 0 +c] =o 
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The smallest of these new temporary labels is L(1)=31, so k=1. We 
therefore set p=1 and make L(1) permanent. The updated table of 


labels is 
Vertex s 1 2 3 4 f 
Label L(i) 0 31 57 © 106 « 
Status P hi Li oi i Ti 


Since vertex f has a temporary label, we repeat Step 2 of the 
algorithm for those vertices having a temporary label. Using (4.44) now 
gives 

L(2) = min{L(2), L(1) + d(1, 2)) 

= min[57, 31 +00] =57 

L(3) = min{L(3), L(1) + d(1,3)] 

= mine, 31+90]=121 

L(4) = min{L(4), L(1) + (1, 4)) 

= min[106, 31+97] = 106 

L(f)=min{L(f), L(1)+ d(1, f)] 

= mince, 31 +0] =o 


The smallest of these values is L(2)=57 so we set p=2, make L(2) 
permanent, and the new table is 


Vertex s 1 2 3 4 f 
Label L(i) 0 31 87 121 106 = 
Status id P P T tT itt 


We again repeat Step 2 of the algorithm. 
The new labels are 


L(3) = min{L(3), L(2) + d(2, 3)] 
=min[121, 57+ 83]=121 
L(4) = min[L(4), L(2)+ d(2, 4)] 
= min[106, 57 + 49] = 106 


L(f)=min{L(Ff), L(2)+ d(2, f)) 
= minlce, 57 +20] =02 
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We see that p=4, so L(4) is made permanent and the new table is 


Vertex s 1 2 3 4 f 


Label L(i) 0 31 57 121 106 


Status 1 P Fe ae [2 ai 


Vertex s 1 2 3 4 f 
Label L(i) 0 31 57 121 106 156 
Status P P P FP Pr Pp 


We have therefore found that the shortest path from s to f in Figure 4.28 has length 
156, since by definition this is the value of L(f) when L(f) has become permanent. 


EXERCISE 4.21 Use Dijkstra’s algorithm to find the length of the shortest path from 
vertex s to vertex f for the network shown in Figure 4.29. 


Figure 4.29 


The algorithm as described so far gives only the length of the shortest path. To 
find the shortest path itself we add another step to the algorithm, enabling us to find 
the vertices on the optimal path. 


Step 4 For each permanently labelled vertex j other than the starting vertex s, 
define a vertex r(j) as follows: 


r(j)=i, where L(j)=L(i)+d(i,j), i#J (4.45) 
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If this is not unique it means there is more than one shortest path. A shortest 
path, in reverse order, is 


[r(f), f= (a1, Ff), [r(a1), (f= (a, a1) 


[r(a), r(a,)] = (45, 42), [r(43), 7(42)] = (4a, 43), 


(4.46) 


as shown in Figure 4.30. 


a 


ae ‘a 


as 


Figure 4.30 


@ EXAMPLE 4.13 (continued) 


Use the final table of labels, set out in Exercise 4.20. In (4.45) take j= 1 and refer 
to Figure 4.28 to obtain 


r(1)=s, since L(1)=31=L(s)+d(s, 1) 

With j=2 in (4.45) we get 
r(2)=s, since L(2)=57=L(s)+ d(s,2) 

and similarly 
jJ=3: r(3)=1, since L(3)=121=L(1)+ (1,3) 
j=4: r(4)=2, since L(4) = 106= L(2)+ (2, 4) 
j=f: rlf)=3, since L(f) = 156 = L(3)+ (3, f) 

Using (4.46), the /ast arc of the shortest path is 
[r(f), fl = (3, f) 

Equation (4.46) shows that the previous arc is 
[r(3), rl f)] = (1,3) 

Repeat the process to get the next previous arc: 
[r(1), (3)] = (s, 1) 


and we have now reached the starting vertex. The shortest path is therefore as 
shown in Figure 4.31. 
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31 


se 
35 3 


Figure 4.31 


EXERCISE 4.22 Determine a shortest path for the network in Exercise 4.21. 


EXERCISE 4.23 What is the effect on the shortest path of changing d(5, f) from 5 to 6 in 
Figure 4.297 


If there is no arc connecting vertices i and j then we set d(i, j)=c° in Dijkstra’s 
algorithm. In the same way we can apply the method to a directed network by setting 
d(i, )) =~ if there is no directed arc from / to j. 

We now turn to an interesting type of graph called trees. The name arises from 
the idea of a ‘family tree’ which shows the relationships between generations — a 
family tree for a bee population is shown in Figure 1.5 in Chapter 1. 


@ EXAMPLE 4.14 


The United Kingdom postcode consists of two letters signifying the postal 
town, for example LS stands for Leeds. There are 120 postcode areas. This is 


Postal areas 
in Yorkshire 


Leeds 
districts 


LS29 8AA 
LS29 8AB 


LS29 8TJ 


LS29 8TK 


LS29 8TL 


Figure 4.32 
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followed by one or two digits denoting the postcode district, for example LS1 is 
part of central Leeds, LS22 is Wetherby and LS29 is Ilkley. Finally, one or two 
digits denote the sector within the district, and two more alphabetical 
characters pinpoint the address to within 15 letterboxes on average. The 
diagram in Figure 4.32 is a tree which shows how an item sent to a destination 
with postcode LS29 8TL can be sorted in stages. 


A graph is called connected if there is a path from each vertex to every other 
vertex. The precise definition of a tree is a connected graph which does not contain 
any cycles (defined in Example 4.12(b)). Trees have the following properties: 


(i) If there are n vertices then there are n— 1 arcs, 

(ii) Every pair of vertices is linked by exactly one path. 

(iii) The removal of any arc produces a disconnected graph (i.e. one which is 
not connected). 


Properties (ii) and (iii) are direct consequences of the definition of a tree. 
Property (i) requires a proof by induction, and can be found in textbooks on graphs. 
It is interesting, however, that a converse result holds which enables us to recognize 
trees: if a graph has no cycles, n vertices and n—1 arcs then it is a tree (see Problem 
4.14). 


Figure 4.33 


ne 
a a 
= <t ep 


' 


=, 


Figure 4.34 
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M@ EXAMPLE 4.15 


All possible trees having six vertices are shown in Figure 4.33. You can see that 
they each have five arcs. Also, there is a path connecting every pair of vertices; 
if one arc is removed then we are left with a disconnected graph, some 
examples of this being given in Figure 4.34 for each of the trees in Figure 4.33. 


Trees can be used to count all the possible ways in which a sequence of events 
can occur. 


@ EXAMPLE 4.16 


An unbiased coin is tossed three times. The various possible outcomes are 
shown by the tree in Figure 4.35. We read from left to right, and H indicates the 
coin comes up heads, T that it shows tails. 


Figure 4.35 


The numbers on the arcs indicate the probability that each event occurs, 
and these are all }, since the coin is unbiased. For example, two heads and one 
tail can be obtained in three ways: 


HHT, HTH, THH 


Following these on the tree, you can see that the probability of each of them is 
}x4x}=}, so the overall probability of getting two heads and one tail is 


ate 


@ EXAMPLE 4.17 


Suppose that in a tennis tournament the rule is that a player is eliminated after 
a single defeat. Let’s look at the problem of how to find the smallest number of 
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matches which need to be played so that there is an overall winner. To 
represent the tournament by a tree, start with the last match, the winner of 
which will be the overall champion. The players in this last match must have 
been the winners of two previous matches as represented in Figure 4.36. 


Final match 
aS, Semifinals 
Figure 4.36 
Similarly each of the four semifinalists must have been the winner of four 
previous matches, as shown in Figure 4.37. At each stage of the tournament the 
number of black vertices is the number of matches played, and the number of 
players involved is therefore twice this. For example, at the quarter-final stage 
in Figure 4.37 there are four matches and eight players. Thus if there were 


exactly eight players, the situation shown in Figure 4.37 would be sufficient, and 
there would be a total of 4+2+1=7 matches. 


Final match 
Semifinals 
Quarter-finals 


Figure 4.37 


However, if there were only seven players we would have the situation 
shown in Figure 4.38; one of the players would not compete until the semifinals, 
and this player's bye in the first round is represented in Figure 4.38 by a white 
vertex. In total there are 3+ 2+ 1=6 matches in this case. 


------------------ Final match 


eae eee Semifinals 


een ele ata Ist round 


Figure 4.38 


The tree for nine players is shown in Figure 4.39(a) — here there is only 
one first-round match, then four second-round, two third-round and a final, 
giving eight matches in total. 


(a) (b) (c) 
Figure 4,39 
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Trees representing tournaments with 10 and 11 players are given in Figure 
4.39(b) and (c) respectively. 


EXERCISE 4.24 For the tennis toumament in Example 4.17 draw a tree and hence 
determine the smallest number of matches when there are (a) 15, (b) 19, (c) 27 
competitors, 


EXERCISE 4.25 A total of six black and six white balls is distributed between three boxes. 
Box 1 contains one black and two white balls, box 2 contains two black and three white 
balls, and box 3 contains three black and one white balls. You select a container at 
random and take one ball out of it (also at random). Draw a tree which represents all the 
possible outcomes. Indicate on each arc the probability that the particular event occurs. 
Hence determine whether it is more likely that a black ball or a white ball is chosen. 


EXERCISE 4.26 Twenty teams have entered for a ‘knockout’ football competition, in 
which the winners of round 1 proceed to round 2, and so on, until a single winning 
team emerges. There are no drawn games. Draw a tree showing a scheme for the 
tournament, with the smallest possible number of rounds and all byes in round 1. 


In Example 4.17 it was seen that with 7, 8, 9, 10 or 11 players there would be 6, 
7, 8, 9 or 10 matches respectively. You will have found similar results for the 
‘knockout competition’ problems in Exercises 4.24 and 4.26. In fact, it’s easy to 
generalize this: if there are n players in a knockout tournament then there will be 
exactly n—1 matches. This is because every player loses exactly one match (and is 
then out of the competition), with the exception of the overall winner who does not 
lose any match. 


EXERCISE 4.27 Two players A and B compete in a chess tournament. The winner is the 
one who is first either to win two games in a row, or to win a total of three games. 
Draw a tree which shows all the possible ways in which the game can proceed to a 
conclusion. 


EXERCISE 4.28 A frog hops along a straight line (the x-axis), It begins at the origin, and 
each jump has unit length either to the right or to the left. It stops either when it has 
made a total of four jumps, or if it reaches x=3 or x=—2. Draw a tree which 
represents all the possible paths the frog can travel. 


If for a particular graph G we delete some of the arcs but keep the same set of 
vertices we obtain what is called a subgraph of G. If this subgraph is a tree, it is 
called a spanning tree for G. 


M@ EXAMPLE 4.18 


For the graph in Figure 4.26, some (but not all) of the spanning trees are shown 
in Figure 4.40. 
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4 4 


Figure 4.40 


™@ EXAMPLE 4.19 


A spanning tree is shown in Figure 4.41(b) for the graph in Figure 4.41(a). 


(a) 
Figure 4.41 


If the arc x is removed then the graph in (b) is no longer a spanning tree, 
since it becomes a disconnected graph. Alternatively, if the arc y (indicated by a 
dashed line) is added then (b) is no longer a tree since it contains a cycle. 
EXERCISE 4.29 Draw the eight spanning trees for the graph in Figure 4.42. 


2 | 4 


1 
Figure 4.42 
If we have a network where the weights on the arcs represent distance, then a 


minimal spanning tree is a spanning tree which has the least possible length. It is 
important to determine minimal spanning trees for applications such as designing a 
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network to connect a set of computers, planning a layout for a cable television 
network, or siting warehouses to be linked by a road system. For this reason a 
minimal spanning tree is also called a minimal connector. 


EXERCISE 4.30 Suppose fiber optics communications cables are to be laid alongside the 
main roads shown in Figure 4.24, so that all the cities are connected. Determine a 
scheme which uses the shortest length of cable. 


You shouldn’t have had too much trouble solving Exercise 4.30 by trial and error 
— in fact, there are two possible minimal connectors. In a large problem, however, 
we need a systematic way of finding a minimal spanning tree, and this is provided by 
Prim's method. We begin constructing the tree with one arc. We then add one arc 
which is the shortest of the remaining arcs which have one vertex in the tree (we 
reject arcs which have two vertices in the tree, since they would produce a cycle, and 
arcs which have no vertices in the tree, since this would produce a disconnected 
graph). The procedure is continued until a spanning tree is obtained, and by 
construction this will be minimal. 


Prim’s Algorithm 
Step 1 Take any vertex in the network, and select from this vertex the arc. which 
has the shortest length. Call this graph 7. 


Step 2 Select the arc (i, /) with the smallest length from amongst all arcs (i, k) 
with i in T and k not in T. Add this arc to T. 


Step 3 If 7 is a spanning tree for the given network, the solution has been 
obtained. Otherwise, repeat Step 2. 


@ EXAMPLE 4.20 


We'll use Prim’s algorithm to find a minimal spanning tree for the network 
shown in Figure 4.43. 


1 o 
Figure 4.43 
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Step 1 Begin at vertex 1 and select arc (1,2) which is the shortest arc 
originating at this vertex. Hence 


T={(1, 2)} 
Step 2 Consider all the arcs which have one vertex in T: 
(1, 5), (1, 3), (2, 3) 
The shortest is (1,3) so we add this to 7, producing 
T={(1, 2), (1, 3)} 
Step 3. Tis nota spanning tree; go to 
Step 2 Consider arcs with one vertex in T: 
(1,5), (3, 4), (3, 5) 
The shortest is (3, 4); adding this to T gives 
T={(1, 2), (1,3), (3, 4} 
Step 3. Tis not a spanning tree; go to 
Step 2 Consider arcs 
(1,5), (3, 5), (4, 5) 
The shortest is (4, 5), giving 
T={(1, 2), (1,3), (3, 4), (4, 5) 


Step 3. 7 is now a spanning tree, and therefore is a minimal spanning tree 
with length 23, shown in Figure 4.44. 


1 5 
Figure 4.44 


EXERCISE 4.31 Repeat Example 4.20, starting at vertex 3. 


EXERCISE 4.32 Use Prim’s algorithm to find a minimal spanning tree for the network in 
Figure 4.45 starting at vertex 7. Repeat, starting at vertex 1, to obtain a different 
minimal spanning tree. 
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Figure 4.45 


Three other optimization problems associated with networks are as follows. 


The travelling salesperson problem 


A salesperson has to visit a number of towns and return to the starting point. Each 
town is to be visited exactly once, and the travelling times between the towns are 
known. The problem is to carry out the tour in the shortest possible time. 


The Chinese postperson problem 


A postperson picks up mail at the sorting office and has to deliver it in a certain district, 
and then return to the depot. Each street in the district has to be covered at least once. 
The problem is to choose a route requiring the shortest possible distance to be travelled. 
This problem is called ‘Chinese’ because it was first studied by a Chinese mathema- 
tician in 1962, not because of any peculiarities of the Chinese postal system! A similar 
problem faces a truck carrying out salting and gritting of roads in winter. 


Critical path analysis 


For a complex project, the times to carry out various activities are represented on a 
network which shows interrelationships between the activities. For example, to build 
a house involves laying the foundations, erecting the walls, putting on the roof, 
installing the electric wiring and plumbing, plastering, decorating and many other 
jobs. The problem is to find a ‘critical path’ through the network which identifies the 
minimum time in which the project can be completed. 


We don’t have space to consider these problems here, but they are dealt with in 
several of the books listed at the end of this chapter. In particular, the book by 
Wilson and Watkins (1990) also describes other interesting applications of graphs. 
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We'll end this section with a brief look at how matrices can be applied to graphs. 


First, go back to the idea of the bipartite graph, an example of which was shown in 
Figure 4.25. 


EXAMPLE 4.21 


The bipartite network in Figure 4.46 represents connections between three 
airports A,, A,, A, in country A with airports B,, B, in a second country B. 


as 5 
3 B, 
A, 
1 B 
A; 2 
Figure 4.46 


The numbers on the arcs are the numbers of different airlines flying on that 
route — for example, there are four airlines offering flights from A, to B,. We can 
express the information provided by this network in the matrix 


5 3 
T=|4 1 
0 2 


The rows correspond to the airports A,, A,, A; and the columns to B, and B,. 
Each entry in row i, column j gives the number of different flights from A; to B,, 
so for example the 2, 1 entry is 4. Suppose that there are onward connections to 
three airports C,, C,, C3, in a third country C, shown in Figure 4.47. 


C, 


Figure 4,47 


The matrix for this network is 


el ee a 
are 
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where the rows correspond to airports in country B and columns to country C. If 
we combine together the two networks, then the matrix showing flights 
between countries A and C is just the product 


27-10) 11 
TS=|16 8 6 (4.47) 
8 0 4 


This means, for example, that there are four ways of flying from A, to C, —- you 
can check this by looking at Figures 4.46 and 4.47, where you'll see there are 
two ways of flying from A, to B,, and two ways from B, to C;, giving four 
different ways in total. Similarly, it is not possible to fly from A, to C, (shown by 
the zero entry in the 3, 2 position in (4.47)), but there are 10 ways of going from 
A, to C,. 


EXERCISE 4.33 For the airline network shown in Figure 4.48 write down the matrices for 
flights from country A to country B, and from B to C. Hence obtain the matrix giving 
the numbers of different flights from airports in country A to those in country C. 


Figure 4.48 


If we have a graph rather than a network then we can still associate a matrix A 
with it, called the adjacency matrix. This time the elements of A are either 1 or 0. An 
element a, = 1 in row i, column j indicates there is an arc connecting vertices i and j. 
If a, =0 then there is no arc between these vertices. Obviously a; =a, for all i and 
j, which means that A is symmetric. An adjacency matrix is a convenient way of 
storing a graph in a computer. 


M@ EXAMPLE 4.22 


The adjacency matrix for the graph in Figure 4.23 is 


0 

1 
A=/1 (4.48) 

0 

1 


co-0- 
es Cy os oe 
-=o+-00 
o--02 
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There are five vertices in this graph, so Ais a 5x5 matrix. In general if there are 
n vertices then A is nxn. Notice also that in (4.48) all the elements on the 
principal diagonal are zero. In general the i, i element of an adjacency matrix 
will be 1 only if there is an arc connecting vertex j to itself (such an arc is called 
a loop). You can see that A in (4.48) is symmetric: the elements in the first row 
are the same as those in the first column, the second row is the same as the 
second column, and so on. 


The adjacency matrix can be used to investigate certain properties of graphs. 
Recall that a walk is a set of arcs joining one vertex to another in which repetitions 
of arcs are allowed — that is, we can ‘retrace our steps’. Illustrations of walks were 
given in Example 4.12. In particular, the walk given in part (iii) of Example 4.12(b) 
is a walk from vertex 1 to vertex 3 for the graph in Figure 4.26 involving four arcs. 
It matters in which order the arcs are taken: for example, in Figure 4.26 the two 
walks 


(C1, 2), (2,3),3,D}, (C1, 3), @, 2), 2, 1)} 


from vertex | to itself are different from each other. 

It can be shown that if we raise the adjacency matrix for a graph to the pth 
power then the element in row i, column j of A? is the number of different walks 
from vertex i to vertex j using p arcs. 


@ EXAMPLE 4.23 


Consider the graph in Figure 4.42. Its adjacency matrix is 


Using the standard multiplication rule given in Section 1.4, Chapter 4, it’s easy 
to multiply A by itself to get 


1 
2 
1 (4.49) 
2 


N=N> 
=W-n 


The element b, in the i,j position of A’ is the number of different walks 
between vertices j and jin Figure 4.42 using two arcs. For example, for b,, =3, 
the three walks from vertex 1 to itself using two arcs are 


(1,2), (2,1) 


(1,2), (3,1) 
(1,4), (4,1) 
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Notice that in each of these we retrace our steps. For b,, = 1, the only walk from 
vertex 1 to vertex 2 using two arcs is 


(1,3), (3,2) 

and for b,, = 2, the two walks from vertex 1 to vertex 3 are 
(1,2), (2,3) 
(1,4), (4,3) 


Notice that A? in (4.49) is also symmetric, so only the elements on and 
above the principal diagonal (top left corner to bottom right corner) need to be 
computed. In fact, the portion of A? below the principal diagonal doesn’t give 
us any extra information: for example, b3,= 6,;, which simply says that the 
number of walks from vertex 3 to vertex 2 using two arcs is the same as the 
number from vertex 2 to vertex 3. 


EXERCISE 4.34 List the walks corresponding to all the remaining elements of A? in 
(4.49). 


EXERCISE 4.35 Multiply A? in (4.49) by A to obtain A®. List the walks using three arcs, 
corresponding to the elements of A®. 


EXERCISE 4.36 Using the adjacency matrix A in (4.48) for the graph in Figure 4.23, 
compute A’, List the walks corresponding to its elements. 


4.5 OPTIMAL CONTROL 


In the discussion at the end of Example 3.2 in Chapter 3 we briefly introduced the 
notion of optimal control. The basic idea is to control a system in some ‘best possible 
way’ according to a particular aim or objective. The simple model discussed in 
Example 3.2 was of a car being driven along a straight road; a suggested optimiz- 
ation problem was to drive from one set of green traffic lights to the next set at red, 
either in the shortest possible time, or using the least possible amount of fuel. 
Clearly the control which performs the required task (i.e. the ‘optimal’ control) will 
be quite different according to which objective is aimed at. You probably feel 
instinctively that in the first case, to do the journey as quickly as possible you would 
accelerate flat out, and then put your foot hard down on the brake pedal so as to 
screech to a stop at the red light! In the second case you would minimize the fuel 
consumption by accelerating gently, and then coasting along before coming to a 
standstill at the light. 

Finding an optimal control even for a simple system involves quite a bit of 
calculus, so you may wish to skip this section if you have little or no experience in 
differentiation and integration. As usual, we’ll try and keep the technicalities to a 
minimum, but this last section of the book is something of a bridge to more 
advanced work on optimization. 
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EXAMPLE 4.24 
The very simplest control model is described by a single equation 


Ox 
apie u (4.50) 


where the state variable x(t) and the control variable u are both functions of 

time t. Suppose that initially at time t=0 the system is at the origin, that is 
x(0)=0 (4.51) 

and it is required to choose u so as to transfer the system to x= 1 at t= 1, that is 
x(1)=1 (4.52) 


There are many control functions which will perform this transfer — indeed, as 
many as you like! Some simple functions which do the trick are 
u=1 
u=2t (4.53) 
u=4-6t 
If we solve the differential equation (4.50) in each case, simply by integrating 


each of the expressions in (4.53) and using the initial condition (4.51), the three 
corresponding solutions x(t) of (4.50) are 


x=t? (4.54) 


You can easily check that each of the expressions in (4.54) satisfies the 
conditions (4.51) and (4.52), and when differentiated gives the corresponding 
term in (4.53). The graphs of the functions in (4.54) are shown in Figure 4.49. 


0 1 
Figure 4.49 


Clearly there are an infinite number of curves connecting the origin 0 with the 
point (1, 1). The idea of optimal control is to select a control which minimizes 
some ‘objective function’. Suppose in this example we think of u as some kind 
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of control force, and we want to minimize the energy used in transferring the 
system from its initial state to its final state. The energy used is proportional to 
u*, because energy is required whether u is positive or negative. The total 
energy consumed from t=0 to t=1 is 


Ja? u? dt (4.55) 
0 


and J is our objective function. If we substitute into (4.55) the three different 
controls (4.53) we get, using standard results in integration, 


[UP at=tm=1 
0 
[eet at=1g PR =s 


1 2 gp | (4-6t)? |! _ 

f (4-6t) dt=[ =i8 [-4 
The first control u=1 in (4.53) therefore gives the smallest value of J in (4.55), 
but we can’t say that it is optimal because we've only tried three out of the 
infinite number of possible controls. 

We see in Figure 4.49 that the third control u=4-6t in (4.53) causes the 
system to ‘overshoot’ its target — that is, it goes past x=1 and then returns. If 
we want to avoid this we might decide to minimize the area under the curve 
x(t), that is take instead of (4.55) a different objective function 


J=['xde (4.56) 
0 


In fact we can see from Figure 4.49, without doing any calculations, that the 
area under the curve is smallest when u=2t than for the other two cases. 
Again, we can’t be sure that u=2t is the best possible control to minimize J in 
(4.56), but we have illustrated the fact that the selection of an optimal control 
will depend very much upon the nature of the objective function. 


EXERCISE 4.37 Verify that x= f° satisfies the conditions (4.51) and (4.52). Obtain u from 
(4.50) and evaluate J in (4.55) in this case. 
Repeat this with x=", where n is a positive integer, and obtain the expression 
for J in terms of n. What happens as n gets larger and larger? 


@ EXAMPLE 4.24 (continued) 


When we used the objective function J in (4.55), of the three controls in (4.53) 
we found that u=1 was the best, since it gave the smallest value of J. We now 
establish that this is the best of a// possible controls, by introducing the idea of 
the hamiltonian function (named after the nineteenth-century Irish mathema- 
tician Hamilton) which is defined by 


H= pu+ u* (4.57) 
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In (4.57) pis a new variable, which also depends upon time t, and is called the 
adjoint variable. The expression (4.57) is obtained by multiplying the right-hand 
side of the differential equation (4.50) by p, and adding the quantity u? from 
inside the integral (4.55). It can be shown that the optimal control u* is given by 
the condition 


oH _g (4.58) 


where this notation, called the partial derivative of H with respect to u, simply 
means that we differentiate H in the usual way, regarding everything except u 
as a constant. Applying (4.58) to (4.57), in which we regard p as a constant so 
far as differentiation is concerned, gives us 


p+2u=0 
so that the optimal control is 
u*=-3p (4.59) 


We appear to be no further forward, since we don’t know what this mysterious 
‘adjoint variable’ is! In fact, p satisfies another differential equation: 


se pa oe (4.60) 


where in (4.60) the notation dH/dx means the partial derivative of H with 
respect to x, regarding everything except x as a constant. Indeed, since H in 
(4.57) doesn’t contain x at all, we have dH/dx=0, so (4.60) becomes 


GP _9 

dt 
From this we deduce that p= a, where a is a constant. We're now getting 
nearer, since we can say from (4.59) that the optimal control is 

u*=-ja 


It remains to find the value of the constant a, and to do this we use the 
conditions x(0)=0, x(1)=1. The original differential equation (4.50) is now 


dx it 
ox eu 
dt 
a | 
=-la 
and integrating this with respect to t gives 
x= -jat+b 


where b is another ‘constant of integration’. However, since x=0 when t=0 it 
follows that b=0, so that x reduces to 


x= -jat 


Finally, since x= 1 when t=1 we must have a=-—2, so that the optimal control 
is indeed 


u*=-la=1 


as we suggested above. 
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M@ EXAMPLE 4.25 


Let's look at a slightly more complicated case where the system is described by 
the single equation 
dx 
dt 


instead of (4.50), but the objective function J to be minimized remains the 
expression in (4.55), and the initial and final conditions (4.51) and (4.52) are also 
the same. As before we introduce an adjoint variable p and multiply it by the 
right-hand side of (4.61), and add on u* to produce the hamiltonian 


=x+u (4.61) 


H= p(x+ u)+u? 
= px+ put u? (4.62) 


The optimal control u* is given by (4.58). However, the terms involving u in 
(4.62) are the same as those in the previous example in (4.57), so as before we 
have 


oH p+2u 


and setting this equal to zero as in (4.58) gives the optimal control 


However, this time H in (4.62) contains a term px, so the partial derivative 
dH/dx is equal to p. Equation (4.60) therefore becomes 


The solution of this equation is 
p=ae' (4.63) 


where a is a constant of integration, as you can verify by differentiation (we 
discussed equations like (4.63) in Exercise 3.18 in Chapter 3). We therefore have 


u*=—-3p 
= ~ja0-* (4.64) 
Substituting the expression (4.64) into the state equation (4.61) gives us 
dx 1 ppt 
Ox «ene 4.65 
apr (4.65) 


This isn’t the place to go into solving differential equations like (4.65) — there are 
very many books where this is covered. We'll simply state that the solution of 
(4.65) is 

x= be'+jae"' (4.66) 
where b is a constant of integration, and leave it to you to verify, if you 
wish, that when (4.66) is differentiated it satisfies (4.65). The values of the 
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constants a and b in (4.66) are found by using the conditions at t=0 and t=1. 
These give 
x(0)=0: b+ia=0 
x(1)=1: be+jae'=1 
and some simple algebra to solve these simultaneous equations produces 
-4 
e-e" 
Hence from (4.64) we obtain the optimal control to be 
2e" 
e-e' 


u*= 


EXERCISE 4.38 Repeat Example 4.25 if the system is to be transferred from x(0)=2 to 
x(1)=0, 
EXERCISE 4.39 A system is described by the equation 


ae 
dt 


The control u is to be chosen so as to minimize the objective function 


fea 
0 


whilst transferring the system from x(0) = 1 to x(1)=0. Show that the optimal control is 


So far we have only considered control systems described by a single differential 
equation. As we saw in Chapter 3, realistic models of systems will involve many 
state variables. Naturally, the mathematics required to find an optimal control 
becomes more complicated, but the basic step of constructing a hamiltonian function 
remains unaltered. 


@ EXAMPLE 4.26 


Consider a carriage which runs along smooth straight rails, and has a rocket 
motor at each end, as shown in Figure 4.50. This can perhaps be thought of as 


i ae 
a — 


Figure 4.50 
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the ultimate high-speed train, hurtling through the Channel Tunnel! According 
to Newton’s law of motion given in equation (3.3) in Chapter 3, if we assume 
for simplicity that the carriage has unit mass then the equation of motion is just 


ax 

dt? 
where u is the net force exerted by the rocket motors. The problem is to 
transfer the carriage from rest at its starting point (the origin 0) to rest at its 
final destination (taken for convenience to be x=1 at time t=1), whilst 
minimizing the total energy required, namely 


{ udt (4.68) 


u (4.67) 


which is once again the same expression as in (4.55). We first define as the state 
variables the position x and velocity dx/dt of the carriage, that is 


Paros, el oA (4.69) 


so that the relationship between these variables is 


dx, 
pl 4.70 
7 xX ( ) 
Also, the equation of motion (4.67) can now be written as 
dx, 
nee 4.71 
ance (4.71) 


since dx,/dt=d?x/dt?. We now have two differential equations (4.70) and (4.71) 
describing the state of the system. The definition of the hamiltonian function 
now requires two adjoint variables, one multiplying the right-hand side of 
(4.70) and the other the right-hand side of (4.71), so we get 


H= p,X,+ P,u+ u? (4.72) 


where as before we add on the quantity u? from inside the integral (4.68) to be 
minimized. The optimal control u* is still given by setting H/du=0 as in (4.58), 
where in the differentiation of (4.72) we regard as constant all the variables 
except u. This gives us 


P,+2u=0 
so that 

u* =-3p, (4.73) 
As in (4.60), each adjoint variable satisfies a differential equation, and these are 
now 


dp,;__ dH de __ dH 

ec dn eee tll 4.74 
dt Ox, dt OX, { J 
As before, the notation dH/dx, means differentiate H with respect to x,, 
regarding everything else as constant; similarly for dH/dx,. Applying (4.74) to H 
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in (4.72) produces 


dp, dp, 

pees OM ae ‘ 

a Wp Py (4.75) 
since H does not contain x,, and the term in x, is Pp, x2. It follows from the first 
equation in (4.75) that p,=a, where a is a constant. Hence from the second 
equation in (4.75) we have 


dp, _ 

dt 
which gives 

P.,=-at+b 


where b is another constant of integration. The optimal control is therefore 
given by (4.73) as 


u* =}at-—3b (4.76) 
and substituting this into (4.71) gives 


Integrating with respect to t produces 
X, =jat?-jbt+c (4.77) 


where cis another constant. We can now put (4.77) into (4.70) and integrate yet 
again with respect to tto obtain 


x, = at®—jbt? + ct+ d (4.78) 


where d is a further constant. We now use the conditions at t=0 and t=1 to 
obtain the values of the constants a, b, c, d. Since the system starts at the 
origin with zero velocity x, we have 


t=0, x,=0, %=0 


which when substituted into (4.77) and (4.78) gives c=0, d=0. At the end of the 
journey when x, = 1 and again the velocity x, is zero we have 


t=1, x=1, %4=0 
which when substituted into (4.77) and (4.78) gives 

ta-ib=0 

ja-jb=1 
Solving these equations gives a=-24, b=-12, so finally from (4.76) the 
optimal control is 

u*=-12t+6 (4.79) 


Notice that this optimal control in (4.79) starts off at u*=6 units, decreases to 
zero at t=} and ends up at —6 units. This is what we would expect: referring to 
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Figure 4.50, in the first half of the journey the right-hand rocket is switched off, 
the left-hand rocket accelerates the carriage by starting off at 6 units and 
steadily reducing to zero; in the second half of the trip the left-hand rocket is 
off, the right-hand rocket decelerates the carriage by steadily increasing its 
thrust from zero to 6 units (the negative sign occurs in u* because this thrust is 
in the negative x-direction). 


EXERCISE 4.40 With the optimal control in (4.79) determine the value of the objective 
function in (4.68). 


Our discussion of controllability in Section 3.2 of Chapter 3 assumed that there 
were no restrictions on the magnitudes of the control variables, but in practice these 
are very often present. To get some idea of what happens in such cases, let’s 
continue with our rocket-propelled carriage which is still to make the same trip from 
rest at x=0 to rest at x=1. However, suppose now that the rockets can exert a 
maximum thrust of 1 unit. Since each can fire in only one direction, the net thrust 
can be at most 1 unit to the left or right, thatis —-1 <u<1. 

The control (4.79) could then not be used, since it required thrusts greater than 1 
unit. To simplify matters, we'll consider minimizing the total time T of the journey, 
instead of minimizing energy used. It turns out that the optimal strategy in this case 
is 


ut=+1, O<t<3T 


ut=-1, 37<t<T 20) 


This means that for the first half of the trip the left-hand rocket is used at maximum 
thrust, giving maximum possible acceleration, and for the second half of the trip the 
right-hand rocket is also used at maximum thrust, giving maximum possible 
deceleration. Thus no intermediate values of rocket thrust are used — a motor is either 
full on, or off. Because of this property, the optimal control (4.80) is given the 
graphic name ‘bang—bang control’ — a rare example of informal language being used 
as a technical term! In fact the concept of bang—bang control clarifies our intuitive 
discussion at the beginning of this section on how to drive a car from one set of 
traffic lights to another as quickly as possible: ‘bang’ the foot down on the 
accelerator pedal, and then on the brake! However, investigation of how we derive 
bang—bang controls is well beyond the mathematical level of this book. Instead, we 
end by giving a couple of examples of interesting optimal control problems, without 
attempting to solve them. 


@ EXAMPLE 4.27 


A spacecraft is a distance h from an asteroid whose mass is so small that its 
gravitational attraction can be ignored. A small landing vehicle separates 
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from the spacecraft at t=0 with initial downwards velocity v, and the 
objective is to achieve a ‘soft landing’ on the asteroid - that is, at the instant 
of touchdown (time t= 7) the vertical velocity is to be zero. Only ‘vertical’ 
motion is considered. Let x, be the altitude and x, the velocity of the lander 
at time t. The equations of motion are just (4.70) and (4.71). Suppose that it 
is required to minimize a combination of fuel used and the time T to landing. 
We also suppose that the rocket motor can fire either up or down, with 
maximum magnitude 1 unit. The total fuel consumption can be represented 
as 


c=["lulde 
0 


where |u| denotes the magnitude of the control, that is ignoring its sign, since 
fuel is used whatever the direction of the rocket thrust. The overall objective 
function to be minimized is C+ T. It can be shown that provided v7/2< h<5v7/2 
then the optimal control in this case has ‘zero—bang’ form, that is 


u*=0, O<tst, 
u*=1, t<t<T 
where t, = h/v-v/2, and the minimum time to landing is 


hiv 
7p 
Ve? 
Thus the vehicle coasts down at constant velocity v with the motor off, until at 
time t, the motor is switched on to give maximum possible upwards thrust, so 
that on reaching the asteroid’s surface the vehicle touches down with zero 
velocity. 


™@ EXAMPLE 4.28 Cash balance model 


A firm has a known demand for cash over a period of time T. In order to meet 
this demand the firm must have access to funds, in the form of either actual 
cash or investments. There are two conflicting aspects of this: if the firm holds 
too much cash then it loses money which could be earned by investments; 
alternatively, if too little cash is held then the firm has to sell investments to 
meet the cash demand, and thereby incurs a broker's commission. The problem 
is to find an optimal trade-off between the amounts kept in cash and 
investments. 

Let x, and x, be the balances held in cash and investments at time t, and 
let the known demand be d, which will vary with time. The control variable u 
is the amount of investments bought or sold, where a negative value means 
a purchase. In practice u will be subject to limits on the amounts which can 
be bought or sold, but a cost a|u| is incurred for each transaction, where a is 
the broker's commission rate. Let r, and r, be the rates of interest on the 
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cash and investments respectively. The equations describing the state of the 


system are 
dx, 
‘1 
— = hx - d + u —- alul 
dt 
rate of interest cash cash broker's 
increase of earned paid from commission 
cash balance oncash out investments 
sold 
dx, 
— = hy - U 
dt 
rate of interest investments 
increase of earned on sold for 
investment investments cash 


balance 


Starting with known balances x,(0), x,(0), the problem is to determine the 
control which maximizes the net sum x,(7)+ x,(T) held by the firm at the end 
of the period under consideration. The optimal control turns out to have 
‘bang-zero-bang’ form, where investments are either bought or sold in the 
maximum allowable amounts, or not at all. 


EXERCISE 4.41 A system is described by the equation 


dr 
where the control u is to be chosen so as to minimize the objective function 


u 


fe (x? +w2)de 
0 


whilst transferring the system from x(0)=0 to x(1)=1. Construct the hamiltonian, 
and show that under the influence of the optimal control u* the system satisfies the 


equation 


Verify that the solution of this equation is 
1 


== ; ('-e) 
e-e" 
and show that 
we—l ("+e") 
e-e! 


EXERCISE 4.42 An inventory-control production-scheduling problem is described by the 
equation 
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where / is the inventory level and P is the production rate, after known sales demand 
has been met. It is required to control the production rate P from time t=0 to t= T so 
as to minimize the objective function 


{ (PF +P?)dt 
0 
Set up the hamiltonian function, and show that the adjoint variable p satisfies the 
equation 
a, 
<2 =p 
dt 
Hence show that the optimal control has the form 
P*=ae'+ be™' 


where a and b are constants. 
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4.2 


44 


4.5 


Use Fibonacci search to determine the value of x which maximizes the volume of the 
box in Exercise 4.2 to within an interval of uncertainty of length not more than 
0.4 cm. 


Use Fibonacci search with eight function evaluations to obtain the approximate value 
of x which maximizes 
foe 223, 107665 
1+x? 


Use Fibonacci search with eight function evaluations to estimate the optimal radius of 
the beer can in Example 4.1, assuming 0<x<5. Repeat with 2<x<5 and seven 
function evaluations. 

If you are familiar with the calculus approach, use it to obtain a more accurate 
solution, 

Measure the radius of an actual 440 ml can. Why do you think it differs from the 
‘optimal’ value you have calculated? 


A tourist bus company has 19 minibuses each of which can carry 18 passengers, and 
17 larger coaches which can carry 35 passengers. The company employs 30 drivers 
and 35 guides. Each bus carries only a single driver. The minibuses require only a 
single guide each, but two guides are carried on each of the larger buses. It is required 
to carry as many passengers as possible at any one time. Let x, denote the number of 
minibuses and x, the number of larger buses to be used. Express the problem in LP 
form, Sketch the feasible region, and find the optimal solution graphically and 
algebraically. 


A tyre manufacturer operates two factories. Factory A produces 100 Super Tyres, 300 
Excellent Tyres and 500 Budget Tyres per day. Factory B produces 200 of each kind 
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of tyre per day. The manufacturer has an order for 8000 Super, 16 000 Excellent and 
20000 Budget Tyres. The daily running costs for each factory are £20000, The 
problem is to determine how many days each factor should be operated to fulfil the 
order as cheaply as possible. 

Let x, and x, be the numbers of days factories A and B are open. Express the 
problem in LP form and solve graphically and algebraically. 


A refinery makes three grades of petrol (P,, P,, P;) from three crude oils (c,, ¢2, C3). 
Crude type c, can be used in any grade, but the others must satisfy the following 
specifications: 


Grade of petrol Specification Selling price 
(pence per litre) 
P, Not less than 45% c, 65.3 
Not more than 25% c, 
P, Not less than 25% c, 53.5 
Not more than 60% c, 
P; No restrictions 52.1 


There are capacity limits on the availabilities of the three crude oils, as follows: 


Crude Available capacity Cost 
(kilolitres) (pence per litre) 

Cc 150 61.2 

Cy 160 50.8 

Cy 80 54.9 


Let x, litres be the amount of crude oil c; used to make petrol P,. Write down the 
constraints and the profit function which is to be maximized in the form of an LP 
problem. 


Find an optimal solution to the transportation problem having the following table of 
availabilities, requirements and costs: 
Availabilities 
15, 20), 80) 35 
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Consider the LP problem: 
Minimize 9 2z=3x,+2x,+4x, 
subject to: x, +3x,+8x,=4 
X_+12x,+x,=5 
X, Xq, Xz, X, 20 


Starting with the feasible solution x,=4, x,=0, x,=0, x,=5 use the simplex 
technique to obtain an optimal solution. 


An airline operates three types of aircraft on three different routes. The numbers of 
passengers (in thousands) which can be carried annually by each aircraft type are as 
follows: 


Aircraft type 


The available numbers of aircraft of each type are 15, 21 and 20 respectively. The 
costs per aircraft per year are (in certain units) as follows: 


Type 


The estimated numbers of passengers per year to be carried are as follows: 


Numbers Income per 
(thousands) 1000 passengers 
1 290 its) 
Route 2 200 17 
| 230 10 


It is required to allocate aircraft to routes so as to minimize the annual operating cost. 
If demand exceeds capacity, the cost of a ‘lost’ passenger is the amount of income 
lost. 

Let 


x, =number of aircraft on route i of type j 
x;=number of passengers (in thousands) who cannot be accommodated on 
route i 


Express this problem in LP form. 
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4.13 


4.14 
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In a horse-jumping competition a team has four horses and four riders. From past 
experience it is known how many penalty points rider i is likely to incur when riding 
horse j. Set this up as an assignment problem with the aim of matching riders to 
horses so as to minimize the total of expected penalty points. 


A laboratory contains 25 microcomputers which must be connected to a power supply 
having four sockets. Connections are made using extension leads which each have 
four power sockets. Draw a tree which shows the minimum number of extension 
leads needed so that all the computers have power. 


A box contains four black, five white and eight red balls. Two balls are taken out, one 
at a time, without replacement. Draw a tree representing all the possible outcomes. 
Label each are with the probability that the appropriate event occurs. 

What is the probability that one of the balls is red and the other is black? 


A gambler decides to play at most five games of roulette. At each play the gambler 
either wins or loses £10. The gambler will stop before playing five games if he or she 
either goes broke, or wins a total of £30. The gambler begins with £10. Draw a tree 
showing all the possible outcomes. 

In how many of these would the gambler withdraw before playing five games? 


A graph G consisting of m individual trees is called a forest, Show that if G has an 
overall total of n vertices then it has a total of n — m arcs. 

Hence deduce that if a graph has n vertices, n—1 arcs and contains no cycles 
then it is a tree. 


The network in Figure 4.51 shows road distances in kilometres between certain towns 
and cities. A communications company wishes to connect together all these places by 
laying cables alongside these main roads, Find a layout which uses the least cable. 


Manchester 


Bristol 


Figure 4.51 
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4.16 


4.17 


4.18 


Consider the graph in Figure 4.52. The (3,4) and (4,3) elements of its adjacency 
matrix A are each equal to 2, because there are two arcs connecting vertices 3 and 4. 
The (1, 1) element is equal to 1 because of the arc connecting vertex 1 to itself. Write 
down A and determine A’. List the walks involving two arcs corresponding to the 
elements of A*. 


3 
Figure 4.52 


If a graph G has n vertices then to test whether it is connected, construct the 
adjacency matrix A and form the sum 


S=A+A?+A?+---+A"! 


It can be shown that G is connected if and only if S has no zero elements. 
Apply this procedure to the graph in Figure 4.53. 


2 
1 oo 
js; 4 
5 
Figure 4.53 


Trees can be used to decode messages where codewords have variable lengths. As a 
simple example consider the code 


A B Cc D 
0 10 110 111 


This can be represented by the tree in Figure 4.54. 


Any received string of binary digits (assumed error free) can be decoded 
uniquely. This is because no codeword is the start of any other codeword. 

Starting from vertex S, simply trace paths on the tree: for example, to decode 
010111, the initial 0 takes you to A; returning to S, the bits 10 take you to B; finally, 
111 takes you from S to D, so the message is decoded as ABD. 
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Figure 4.54 


Construct the tree for the code 


A B Cc D EB 
00 «(01 10 110 111 


Decode the received messages 


0001111110, 0000111101011001 


4.19  Acontrol system is described by the single equation 


dx 


=-2x+2u (4.81) 
dt 


where the control u is to be chosen so as to minimize the objective function 
1 
f (3x? + u?)dt 
0 


whilst transferring the system from x(0)=0 to x(1) = 1. Show that under the influence 
of the optimal control u* the state variable satisfies the equation 


and deduce from (4.81) that 


= 
ogetta et 


a 
u 
4 


ete 
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4.20 Accontrol system is described by the equations 


dx, 3 

— =3x. 

Cie as 

dx, 

— =-2x, + 5x, +u 
dt 


The control u is to be chosen so as to transfer the system from some given initial state 
to some given final state at time t= 1 whilst minimizing the objective function 


I uw dt 
0 


Set up the hamiltonian function involving two adjoint variables p, and p,. Show that 
u* = —}p, where 


Verify that this differential equation has solution 
Pr=cye-"'+c,e7! 


where c, and c, are constants. 
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Chapter 2 
2.3. error, 00110, error, error 
24° 3 
2.5 Errors in even-numbered digits of 5 in first case or 2, 4, 6 or 8 in second case are not 
detected. All transposition errors detected in first case, but not in second case if 
Xj-Xy,=+5. 
2.6 (i) 7o’clock (ii) 7 o’clock (iii) 8 o’clock 
2.8 All transpositions are detected (note: for x,€9 xs, x54, Visual inspection of passport 
holder detects |x,-x5|=5, | xs—5|=5). 
2.9 (a) 2 (b) 3 
2.10 (a) 101010 (b) 010101 (c) more than one error 
2.11 d=S, detects four errors 
2.13 (a) no (b) yes (c) yes 
2.14 (a)d=4 (b) d=2 
1 
2.15 (a) | (b) | 0 
& 1 
2.16 0000000, 1111001, 1110010, 1010100, 0100110, 0001011, 0101101, 1011111 
2.17 0011010, 0101001, 1010110, 1100101, 0110011, 1001011, 0011101, 0101110, 
0000111, 1111111 
2.19 (a) 01001 (b) 11110, 10111 
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2.20 


2.21 
2.22 


2.23 


2.24 


2.27 


2.28 
2.31 
2.32 
2.33 
2.34 
2.38 
2.42 


Answers to Exercises 


(a) five (b) six 


three check bits 
(a) six (b) 10 (c) many possible choices 


0 
(a) |} 1} (b) 10110, 11101, 10110 

1 

(a) 011001, 110101 

(b) 001111, 110101 (two errors); 101100 (three errors) 


(b) (i) 0010011101 (ii) 1001010011 
(c) (i) 001001 (ii) 010101 (iii) more than one error 


11101100010 

(a) no (b) yes 

check digit=5 

6 

(b) 0 (c) 0471621773, 0481621873, 0471623873 
0206241909, 0612960587 

4003101715 


Chapter 3 


3.1 


3.9 


3.10 


3.11 


3.12 
3.13 
3.14 
3.17 
3.18 


£443.69 


a ee 
A=| 0 -u/m 0 |, B= 


(c) u(1) =-1, u(0) =3 
a=lor2 

not controllable 
a=0,1lor3 


controllable 


Answers to Exercises < 247 


3.19 (b) x(0)=-4 ie] 
3.20 B=-lor - ; 
3.21 observable 

3.23 yes, observable 


3.24 yes, observable 
3.25 «=| 3] 


3.26 f=[2, -3| 
3.27 f=[-19, -6,-11] 


3.29 second fixed eigenvalue = —1; f; =f, — 1, f, arbitrary 
330 fi=-4.4 <-} 


3.31 (a) k=10 (b) k #10 (c) impossible 

3.32 (a)3 (b)2 (c)3 

3.33 controllable 

3.34 controllable when u, =0, not controllable when u,=0 
3.35 a=Oorl 

Sar 63 

3.39 observable 


Chapter 4 


4.1 x(250-x) 

4.2 4x°—300x" + 5000x, 0 < x < 25 
4.3 0.7143 < x" < 0.80955 

44 N=11 

4.5 1,882 <x" < 2,058 

4.6 122.8 127.2 


4.7 Maximize z=50x,+50x, 
subject to: x, +2x, < 80, 3x, +2x, < 120;x, > 0,x, 20 


49 x, =20, x, = 30, zee = £2500 
410 x; =40, x7 =0, Za = £3200 
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401 
4.12 


4.13 


4.14 


4.17 


4.18 


4.19 
4.21 
4.22 
4.23 
4.24 
4.25 
4.26 


4.27 
4.30 


4.32 


4.33 


4.35 


Answers to Exercises 


mm = | 
@) 4 =5,%2=5) 


x, = 1 if athlete i runs in race j; = 0 otherwise. 
cy = times in table 
Minimize (4.43) subject to (4.41) and (4.42), 


Subjects; Mathematics, Science, Economics, History, French 
Teachers: T, T, T; T, T;; or T,T, T, T; T, 

(a) cycle (b) walk from A, to A, 
39 

s345f 

Shortest paths are s345f or s34f. 
(a) 14 (b) 19 (c) 26 

white 


(c) path from A; to A, 


Total of 19 matches: four in round 1, eight in round 2, four in round 3, two in round 4, 
one final. 


10 different outcomes 


Bradford—Manchester—Sheffield—York—Newcastle; or Bradford—Newcastle, Bradford— 
Manchester, Bradford—Sheffield, Bradford—York 
minimum length = 325 km 


{(1, 2), (2, 7), (6, 7), (7, 5), GS, 4), 4, 3)} 


{(2, 1), (1, 6), (6, 7), (7, 5), (S, 4), (4, 3)} 
length = 130 


Answers to Exercises 


4.36 A>= 


ul 
RPNNEwW 


437 J=%,J=n'/Qn-1) > asneo 


4.38 u* =-4e"/(1 -e°) 
4.40 12 
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Answers to Problems 


Chapter 1 


11 r=6.20 

12 x= (T2P=1 a (pt Ds 

130 xy +k(k+1)(2k+1)/6 

14%, =0.62%, 44) = 0.87y, + 0.38%, 

; = (0.62)*xp, yy = [5.56(0.87)* — 4.56(0.62)']y 

L115, = 21075"! -(-5)*4} 

113 xy =KR(K+ 17/4 

1.15 p= (a* =a? -a’),p + q;1—k/b,p=q 

116 x, =c,(1N2)* + on - 1N2)* + 4 cf3 -(- 14 
a= 


1.17 ax) + 5 atx, + ke" 


2=2(1+V/igR) 


119 Z(i,)= 
Po az41 


i, = ig cosh wk + =F (iy - 2V/R)sinh wk 


1.21) py, +bepp=a 
Pp, =(— be)‘ + af(1 + bc) 
1.24 x, =50, », = 160 


2 

% | _ 10] 5 

was [sled 
2 
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Answers to Problems . 251 


2%o 
1.27 (a) A=|0 2 2|;4,= 4,4, =2iN/3, As = -2i/V3 
1 
300 
1.28 Fi, = (0.096F, + 0.85F, )(1.063)* + (0.904F — 0.85F, )( - 0.113)" 
1300 x%,= gtet 8 months 
131 x,=2'*?-3 
1.32 _ option (ii) 
Chapter 2 
21 (a) 6 (b) 51593-2067 (d) 20782-9960 
2.2 (c) Non-adjacent transpositions detected only if x, is involved. 
2.3 (a) 93471 (c) detected only if x,—x,,,#+5 
2.4 101101110111101, 00110011000 
2.5 0198548273; 0198538723, 0198532873 
2.7 positions 3, 5, 9; magnitudes 4, 2, 7 
2.9 (b) only detected if x,-x,,,#+5 
2.10 9770261307057 
Chapter 3 
3.2 all controllable 
3.3 yes, observable 
3.4 yes, observable 
-1 0 0 0 1 
1 1 
eg ese Wo ley 
Chal One 0 
Om) 107 20 0 
OPO) ae 20: 0 
O10) 30) 1 0 
A= ,B= 
a7 0 O-a 0 b 
dad 0-c 0 d 
3.8 k=20 or 36 
3.9 controllable 
3.11 (a) yes, controllable (b) yes, observable 
313k, #4 
B45 x, =—142°t2- 3! 5 21-24 3*" xo 2 420)" 


252 Answers to Problems 


Chapter 4 


41 10.40 < x* < 10.70 
4.2 19 <x" <21 
43 N=8,3.99 <x° <4.16;N=7,4.05 <x" < 4.25 
calculus solution = 4.12; actual can has x=3,16 
4.4 Maximize 18x,+35x, 
subjectto: x,+x, < 30 
Xx, +2x, <= 35 
x <19,x, < 17,x, 20,x, 20 
Solution is x, = 19, x, =8. 
4.5 Minimize z=20000(x,+x,) 
subject to: x, +2x, = 80 
3x, + 2x, = 160 
5x, + 2x, = 200 
x, 20,x, 20 


Solution is x, = 40, x, = 20. 


4.6 Maximize 4.1%), + 14.5x2) — 7.7%)2 + 2.7%) 
+ 10.4x51 — 9.143 + 1.3%, — 14x39 — 2.8%53 
subject to: 11x, — 9x2, — 9x3, = 0 
Xyy — 3xXq, +3, 2 0 
3X\2 — X22 — X32 > 0 
3X12 — 2Xq) + 3xy. = 0 
Xy +X. +Xy3 < 150 000 
X21 +X) + X23 < 160 000 
Xa, + X32 +33 <= 80000 
xj > 0, all i and j 


4.7 minimum cost = 535 
two solutions 


48 


Answers to Problems P 253 


49 Minimize —17x,, + 19x) + 21x) + 20x23 
+ 163) + 15x39 + 14x33 + 15x, + 17x, + 10x, 
subject to: Xp +X, S15 
Xqq +X3zq < 21 
X13 +Xp3 +X33 < 20 
20x, + 17x42 +x, = 290 
18x49 + 15x53 +X = 200 
19x3) + 18x39 + 17x33 +4 = 230 
Xj 2 0, x; 2 0, all i andj 
4.10 Minimize (4.43) subject to (4.41) and (4.42), where x,;=1 if horse j has rider /; 
=0, otherwise; c,= penalty points. 
4.11 Minimum number of extension leads is seven, 
4.12 Nine different outcomes; probability of one red, one black= 7. 


4.13 Eleven different outcomes; three involve less than five games. 
4.15 Minimum length of cable is 758 km. 


416 A?= 


4.18 ABED, AAECCDB 


Set at | ee Ae, 4 ee ie 
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Index 


adjacency matrix, 222 

arc of graph, 204 

Argand diagram, 13 

argument of complex number, 14 
Article Number Association code, 124 
assignment problem, 232 


bang-bang control, 232 
barcode, 74 
bee population model, 9 
binary 
matrix, 94 
numbers, 76 
binomial expansion, 61 
bipartite graph, 205 
bird population model, 11 
bit, 76 
check, 80 
information, 80 
blue whale population model, 35, 171 
buffalo population model, 68, 140, 170 


car rental model, 47, 66 
cash balance model, 233 
cattle ranching model, 66, 170 
characteristic 

equation, 46 

polynomial, 46 
check 

bit, 80 

equations, 96 

matrix, 95 
Chinese postperson problem, 220 
closed loop 

matrix, 154 

system, 128, 154 


code, 76 
Article Number Association, 124 
check bit, 80 
check, equations, 96 
check matrix, 95 
decimal, 81, 110 
dimension of, 98 
European Article Number, 81 
Hamming, 104 
Hamming distance, 86 
linear, 91 
minimum distance, 88 
parity, 80 
perfect, 101 
perfect Hamming, 107 
repetition, 79, 89 
shortened, 107 
syndrome, 103 
Universal Product, 123 
codeword, 76, 77 
compact disc, 77 
complex number 
argument, 14 
modulus, 13 
congruence, 83 
control 
dual system, 175 
system, 127 
variable, 127 
vector, 130 
controllability, 141 
matrix, 143, 144, 145, 159 
critical path analysis, 220 


decimal code, 81, 110 
determinant, 45, 142, 165 
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diagonalization, 42 
diagonal matrix, 38, 42 
difference equation, 2 
first order, 11 
homogeneous, 21 
matrix, 36 
second order, 11 
solution of, 13 
Dijkstra’s algorithm, 207 
dimension of code, 98 
discrete time, 1 
dominant eigenvalue, 53 
dual system, 175 


economic models, 12, 64, 65, 136 
edge of graph, 204 
eigenvalue, 42 

assignment theorem, 155 

dominant, 53 

strictly dominant, 53 
electrically heated oven, 138, 146, 152 
elementary row operations, 163 
European Article Number code, 81 


feasible 
region, 189 
solution, 189 
feedback, 127, 128, 154 
Fibonacci, 6 
numbers, 8, 19, 61, 180 
search algorithm, 181 
finite field, 84, 114 
fish aquarium model, 5, 16, 59 
forest, 238 


Galois field, 114 
gaussian elimination, 161 
geometric 
sequence, 26 
series, 15 
golden rectangle, 8 
graph, 204 
bipartite, 205 
connected, 213 
directed, 205 
disconnected, 213 
undirected, 205 


hamiltonian function, 226 
Hamming, 104 

code, 104 

distance, 86 


INDEX 


Hooke’s law, 131 
hyperbolic functions, 64 


independent columns, 160 
induction proof, 69 
information bits, 80 

input, 127 

integer programming, 201 
interval of uncertainty, 180 
inverse matrix, 41, 142 
inverse z-transform, 30 
ISBN, 74, 84, 111 


Laplace transform, 34 
Leslie matrix, 51 
linear 
code, 91 
combination, 41, 148, 154, 187 
constraints, 186 
feedback, 154 
profit function, 187 
linearity principle, 3 
linear programming (LP), 186 
basic solution, 192 
constraint, 186 
feasible region, 189 
feasible solution, 189 
simplex method, 191 
slack variable, 190 
loop in graph, 223 


matrix 
adjacency, 222 
binary, 94 
characteristic equation, 46 
characteristic polynomial, 46 
characteristic root, 42 
check, 95 
closed loop, 154 
companion form, 173 
controllability, 143, 144, 145, 159 
determinant of, 45, 142, 165 
diagonal, 38 
diagonalization, 42 
difference equation, 36 
differential equation, 145 
eigenvalue, 42, 43 
eigenvector, 43 
inverse, 41, 142 
Leslie, 51 
non-singular, 142 
observability, 150, 167 


INDEX 


principal diagonal, 38, 161 
product, 39 
rank, 160 
singular, 143 
symmetric, 222 
transpose, 168, 174 
triangular, 161 
unit, 41, 97 
minimal 
connector, 218 
spanning tree, 217 
minimum distance, 88 
modular arithmetic, 84 
modulo, 83 
modulus, 13 


nearest neighbour (NN) decoding, 79 
network, 204 

Newton’s law, 129 

node of graph, 204 

non-singular matrix, 142 

northwest corner method, 196 


objective function, 226 
observability, 148 

matrix, 150, 167 
optimal control, 131, 224 
output, 127 


parity code, 80 

partial 
derivative, 227 
fractions, 31 

path, 205 

pivot, 161 

Prim’s algorithm, 218 

principal diagonal, 38, 161 


rabbit population model, 6, 147, 153, 157 
rank of matrix, 160 

recurrence relation, 2 

redwood forest model, 37, 170 

repetition code, 79, 89 
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search method, 179 
second order 
difference equation, 11 
differential equation, 132 
simple harmonic motion, 132 
simplex method, 191 
singular matrix, 143 
slack variable, 190 
spanning tree, 216 
minimal, 217 
state 
variable, 130 
vector, 130 
subgraph, 216 
syndrome, 103 
decoding, 103 


transportation models, 195 
northwest corner method, 196 

transpose of matrix, 168, 174 

travelling salesperson problem, 220 

tree, 212 

triangle inequality, 87 

triangular matrix, 161 


unit matrix, 41, 97 
Universal Product code, 123 


vector, 4 
control, 130 
state, 130 
Venn diagram, 108 
vertex of graph, 204 


walk, 205, 223 
word, 77 


z-transform, 24 
inverse of, 30 
pairs, 27 

Zip code, 82, 121 


ApPplcations 
oF 
MA athe matics 


Emphasizing discrete models using difference equations and matrix 
representations, this book plays down the importance of calculus and 
differential equations. Realising that many students are not attracted to 
traditional applied mathematics, with its bias towards mechanics, the 
author uses modern and interesting illustrative examples. 


Contains a unique combination of topics, including error-correcting 
codes, optimization, and control theory. 


Focuses on practical applications in business, commerce, information 
technology and the environment - for example, understanding 
supermarket bar codes or planning a cable TV network 


Provides numerous worked examples and class-tested problems 
throughout, complete with answers. 


Uses an informal and readable approach, so as to be accessible to a 
wide range of students. 


Written by a well-known authority in the field. 


Stephen Barnett has been researching, writing and 
teaching in the areas covered by this book for over 35 years. He is 
currently an Honorary Professor in the Departnygnt of Applied 
Mathematical Studies at the University of Leeds. Professor Barnett 
gained his PhD from Loughborough University and his DSc from the 
University of Manchester; he is a Chartered Engineer. He has published 
over 120 research papers and seven books in the area of applied 
mathematics 


IW 


